May 13, 2018 - Is it even neecessary to do anything but maximize likelihood under a good model? ...... S â GetSimilar(c, I, D, t, similarity = cosine). 16.
Representation Learning and Sampling for Networks Tanay Kumar Saha
May 13, 2018
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
1 / 45
Outline 1
About Me
2
Introduction and Motivation
3
Properties of Algorithm for Learning Representation
4
Representation Learning of Nodes in an Evolving Network
5
Representation Learning of Sentences
6
Substructure Sampling
7
Total Recall
8
Name Disambiguation
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
2 / 45
About Me I Defended PhD Thesis (Advisor(s): Mohammad Al Hasan and Jennifer Neville) I Problem/Areas Worked: (1) Latent Representation in Networks, (2) Network Sampling, (3) Total Recall, (4) Name Disambiguation I Already Published: ECML/PKDD(1), CIKM (1), TCBB (1), SADM (1), SNAM (1), IEEE Big Data (1), ASONAM (1), Complenet (1), IEEE CNS (1), BIOKDD (1) I Poster Presentation: RECOMB (1), IEEE Big Data (1) I Paper Under Review: KDD (1), JBHI (1), TMC(1) I In Preparation: ECML/PKDD(1), CIKM(1) I Reproducible Research: Released codes for all the works related to the thesis I Served as a Reviewer: TKDE, TOIS I Provisional Patent Application (3) I Apparatus and Method of Implementing Batch-mode active learning for Technology-Assisted Review (iControlESI) I Apparatus and Method of Implementing Enhanced Batch-Mode Active Learning for Technology-Assisted Review of Documents (iControlESI) I Method and System for Log Based Computer Server Failure Diagnosis (NEC Labs)
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
3 / 45
Outline 1
About Me
2
Introduction and Motivation
3
Properties of Algorithm for Learning Representation
4
Representation Learning of Nodes in an Evolving Network
5
Representation Learning of Sentences
6
Substructure Sampling
7
Total Recall
8
Name Disambiguation
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
4 / 45
Data Representation I
For machine learning algorithms, we may need to represent data (i.e., learn feature-set, X ) in the d-dimensional space (learn d-factors of variation) I
I
I
Representation Learning: Learn a function which can convert the raw-data into a suitable feature representation, i.e. F : D [, Y ] 7→ X I I
I
For classification, we learn a function, F which can map from a feature-set, X to the corresponding label, Y , i.e., F : X 7→ Y For clustering, we learn a function, F which can map from a feature-set, X to an unknown label, Z , i.e., F : X 7→ Z
task-agnostic vs. task-sensitive localist vs. distributed
How do we define the suitability of feature-set, X (quantification/qualification)? I
Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse
I Disentangled Representations in Neural Models by Tenenbaum et. al I Representation Learning: A Review and New Perspectives by Bengio et al I www.cs.toronto.edu/~bonner/courses/2016s/csc321/webpages/lectures.htm Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
5 / 45
Data Representation
I
Good criteria for learning representation (learning X )? I
There is no clearly defined objective I
I
Different from machine learning tasks such as, classification and clustering (we have a clearly defined objective)
A good representation must disentangle the underlying factors of variation in the training data? I I
How do we translate the objective into appropriate training criteria? Is it even neecessary to do anything but maximize likelihood under a good model?
I Representation Learning: A Review and New Perspectives by Bengio et al I (Introduce Bias) Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings I How do we even decide on how many factors of variation is the best for an application?
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
6 / 45
Data Representation I For link prediction in network, we may represent edges in the space of total number of nodes in a network (d = |V |) Repre. of Nodes Repre. of Edges Network 4
5
3
2
id V1 V2
V1 0 1
V2 1 0
V3 1 1
··· ··· ···
id V1-V2
V1 0
V2 0
V3 1
··· ···
1
I
Edge features: Common neighbor, Adamic-Adar
I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
7 / 45
Data Representation I For link prediction in network, we may represent edges in the space of total number of nodes in a network (d = |V |) Repre. of Nodes Repre. of Edges Network 4
5
3
2
id V1 V2
V1 0 1
V2 1 0
V3 1 1
··· ··· ···
id V1-V2
V1 0
V2 0
V3 1
··· ···
1
I
Edge features: Common neighbor, Adamic-Adar
I For document summarization, we may represent a particular sentence in the space of vocabulary/word size (d = |W|) Sentence Representation Sent id Content w1 w2 w3 · · · S1 This place is nice 1 0 1 ··· S2 This place is beautiful 1 1 0 ··· I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
7 / 45
Data Representation in the Latent Space I Capture syntactic (homophily) and semantic (structural equivalence) properties of textual (words, sentences) and network units (nodes, edges) I For link prediction in network, we may represent edges as a fixed-length vector Repre. of Nodes Repre. of Edges Network 4
5
3
2
id V1 V2
a1 0.2 0.1
a2 0.3 0.2
a3 0.1 0.3
id V1-V2
a1 0.02
a2 0.06
a3 0.03
1
I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
8 / 45
Data Representation in the Latent Space I Capture syntactic (homophily) and semantic (structural equivalence) properties of textual (words, sentences) and network units (nodes, edges) I For link prediction in network, we may represent edges as a fixed-length vector Repre. of Nodes Repre. of Edges Network 4
5
3
2
id V1 V2
a1 0.2 0.1
a2 0.3 0.2
a3 0.1 0.3
id V1-V2
a1 0.02
a2 0.06
a3 0.03
1
I Also for document summarization, we may represent fixed-length vector (say, 3-dimensional space) Sentence Sent id Content S1 This place is nice S2 This place is beautiful
a particular sentence as a Representation a1 a2 a3 0.2 0.3 0.4 0.2 0.3 0.4
I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
8 / 45
Data Representation (Higher-order feature/substructure as a feature) A B C
A F
D
D D
B B
E
G1
E C
G2
D G3
I Given a set of networks, such as G1 , G2 , and G3
I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
9 / 45
Data Representation (Higher-order feature/substructure as a feature) A
A
B C
F D
D D
G1
B B
E
E C
G2
D G3
I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features
A
B
B
B
D
B
C
D
E
E
2-node frequent subgraphs
I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
9 / 45
Data Representation (Higher-order feature/substructure as a feature) A
A
B C
F
B
D
D
D
B
E
E C
G1
G2
D G3
I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features
A
B
B
B
D
D
A
A
B
B
C
B
C
D
E
E
B
B
B
D
E
B
C
D
E
D
D
E 2-node frequent subgraphs
B
D
E
3-node frequent subgraphs
I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
9 / 45
Data Representation (Higher-order feature/substructure as a feature) A
A
B C
F
B
D
D
D
B
E
E C
G1
D
G2
G3
I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features
A
B
B
B
D
D
A
A
B
B
C
B
C
D
E
E
B
B
B
D
E
B
E
C
D
E
D
D
2-node frequent subgraphs
B
A B
D
E
3-node frequent subgraphs
D
C
4-node frequent subgraphs
I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
9 / 45
Data Representation (Higher-order feature/substructure as a feature) A
A
B C
F D
B
D D
B
E
G1
E C
G2
D G3
I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features Induced Subgraphs A
B
B
B
D
A
B
C
D
E
E
B C
2-node frequent subgraphs
B
D
E
3-node frequent subgraphs
I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
9 / 45
Data Representation (Higher-order feature/substructure as a feature) A B C
A F
D G1
D D
B B
E
E C
G2
D G3
I Given a set of networks, such as G1 , G2 , and G3 I Find frequent subgraphs of different sizes and use them as features I Similar to learning compositional semantics (learning representation for phrases, sentences, paragraphs, or documents) in text domain I Graph can have cycles, so tree-lstm kind of recursive structure is not an option
I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
9 / 45
Data Representation (Higher-order feature/substructure as a feature) I
Given a single large undirected network, find the concentration of 3, 4, and 5-size graphlets 3-node subgraph patterns
4-node subgraph patterns
5-node subgraph patterns
Figure: All 3, 4 and 5 node topologies I
This type of substructure statistics can be used for the structural information diffusion in the representation learning (within or across modality of the data)
I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
10 / 45
Data Representation (Higher-order feature/substructure as a feature) I
Given a single large directed network, find the concentration of 3, 4, and 5-size directed graphlets
ω3,1
ω3,7
ω3,2
ω3,8
ω3,3
ω3,9
ω3,10
ω3,4
ω3,11
ω3,5
ω3,12
ω3,6
ω3,13
Figure: The 13 unique 3-graphlet types ω3,i (i = 1, 2, . . . , 13). I
This type of substructure statistics can be used for the structural information diffusion in the representation learning (within or across modality of the data)
I (i) Disentangled, Interpretable, Performant, Reusable, Compact (Generalizable?), Invariant, Smooth, Sparse (ii) task-agnostic vs. task-sensitive (iii) localist vs. distributed Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
11 / 45
Outline 1
About Me
2
Introduction and Motivation
3
Properties of Algorithm for Learning Representation
4
Representation Learning of Nodes in an Evolving Network
5
Representation Learning of Sentences
6
Substructure Sampling
7
Total Recall
8
Name Disambiguation
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
12 / 45
Properties of Algorithm for Learning Representation
I
Good criteria for learning representations (learning X )? I Representation Learning: F : D [, Y ] 7→ X I I
I
There is no clearly defined objective Different from machine learning tasks such as, classification and clustering (we have a clearly defined objective) A good representation must disentangle the underlying factors of variation in the training data? I I
How do we translate the objective into appropriate training criteria? Is it even neecessary to do anything but maximize likelihood under a good model?
I Representation Learning: A Review and New Perspectives by Bengio et al I Disentangled Representation for Manipulation of Sentiment in Text
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
13 / 45
Outline 1
About Me
2
Introduction and Motivation
3
Properties of Algorithm for Learning Representation
4
Representation Learning of Nodes in an Evolving Network
5
Representation Learning of Sentences
6
Substructure Sampling
7
Total Recall
8
Name Disambiguation
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
14 / 45
Static vs Dynamic (Evolving) Network I
Static: A Single snapshot of a network at a particular time-stamp 1 3
2
4
5 G1
Figure: A Toy Evolving Network. G1 , G2 and G3 are three snapshots of the Network.
Static vs Dynamic (Evolving) Network I
Static: A Single snapshot of a network at a particular time-stamp 1
1
1
3
2
3
2
3
2
4
5
4
5
4
5
G1
G2
G3
Figure: A Toy Evolving Network. G1 , G2 and G3 are three snapshots of the Network. I
Evolving: Multiple snapshots of a network at various time-stamps
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
15 / 45
Latent Representation of Nodes in a Static Network Network
4
5
3
2 1
I Is it even neecessary to do anything but maximize likelihood under a good model? I word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec I Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
16 / 45
Latent Representation of Nodes in a Static Network Create Corpus
Network
4
5
3
3
4
5
2
3
4
1
3
2
3
4
5
2 1
I Is it even neecessary to do anything but maximize likelihood under a good model? I word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec I Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
16 / 45
Latent Representation of Nodes in a Static Network Create Corpus
Network
4
5
3
Learn Representation
3
4
5
2
3
4
1
3
2
3
4
5
-log P( 4 | 3 ) -log P( 5 | 4 )
2 1
I Train a skipped version of Language Model I Minimize Negative log likelihood I Usually solved using negative sampling instead of softmax
I Is it even neecessary to do anything but maximize likelihood under a good model? I word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec I Manifold learning: LLE, ISOMAP; Dimensionality Reduction: PCA, SVD Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
16 / 45
Latent Representation of Nodes in a Static Network I Most of the existing works can not capture structural equivalence as advertised I Lyu et al. show that external information such as orbit participation of nodes may be helpful in this regard
I Is it even neecessary to do anything but maximize likelihood under a good model? I word2vec, sen2vec, paragraph2vec, doc2vec || Deepwalk, LINE, Node2Vec || Speech2Vec I Enhancing the Network Embedding Quality with Structural Similarity
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
17 / 45
Latent Representation in an Evolving Network 1 3
2
4
5 G1
φ1
I The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot I And, also it should not go far-away from its position in the previous time-step
I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?
Latent Representation in an Evolving Network 1
1
3
2
3
2
4
5
4
5
G1
G2
φ1
φ2
I The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot I And, also it should not go far-away from its position in the previous time-step
I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?
Latent Representation in an Evolving Network 1
1
1
3
2
3
2
3
2
4
5
4
5
4
5
G1
G2
G3
φ1
φ2
φ3
I The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot I And, also it should not go far-away from its position in the previous time-step
I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model?
Latent Representation in an Evolving Network 1
1
1
3
2
3
2
3
2
4
5
4
5
4
5
G1
G2
G3
φ4 φ1
φ2
φ3
I The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot I And, also it should not go far-away from its position in the previous time-step
I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
18 / 45
Latent Representation in an Evolving Network 1
1
1
1
3
2
3
2
3
2
3
2
4
5
4
5
4
5
4
5
G1
G2
G3
G4
φ4 φ1
φ2
φ3
I The latent representation of a node in an evolving network should be closer to it’s neighbor from the current snapshot I And, also it should not go far-away from its position in the previous time-step
I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
18 / 45
Our Solution 1
1
1
1
3
2
3
2
3
2
3
2
4
5
4
5
4
5
4
5
G1
G2
G3
φ1
φ2
φ3
G4
φ4
Figure: Our Expectation W
G2
G3
Smoothing
φ1
φ2
ET
φ1
φ2
φ2
φ3
W1
W2
R
ET R
D
ee
pW al
k
G1
W
φ3
(a) RET Model
(b) Homo LT Model
φ1
φ2
φ3
(c) Heter LT Model
Figure: Toy illustration of our method Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
19 / 45
Solution Sketch
Figure: A conceptual sketch of retrofitting (top) and linear transformation (bottom) based temporal smoothness. Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
20 / 45
Mathematical Formulation
I
Mathematical Formulation for Retrofitted Models X X βu,v ||φt (u) − φt (v )||2 J(φt ) = αv ||φt (v ) − φ(t−1) (v )||2 + v ∈V
|
(v ,u)∈Et
{z
Temporal Smoothing
} |
{z
Network Proximity
} (1)
I
Mathematical Formulation for Homogeneous Transformation Models φ2 φ1 φ3 φ2 (2) J(W ) = ||WX − Z ||2 , where X = . ; Z = . . . .. . φT φT −1
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
21 / 45
Mathematical Formulation I
Heterogeneous Transformation Models J(Wt ) = ||Wt φt − φt+1 ||2 , for t = 1, 2, . . . , (T − 1).
(3)
(a) Uniform smoothing: We weight all projection matrices equally, and linearly combine them: (avg )
W =
T −1 X 1 Wt . T − 1 t=1
(4)
(b) Linear smoothing: We increment the weights of the projection matrices linearly with time: T −1 X t (linear ) W = Wt . (5) T − 1 t=1 (c) Exponential smoothing: We increase weights exponentially, using an exponential operator (exp) and a weighted-collapsed tensor (wct): (exp)
W =
T −1 X
t
exp T −1 Wt
(6)
(1 − θ)T −1−t Wt .
(7)
t=1
(wct)
W =
T −1 X t=1
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
22 / 45
Similarity in Other Modality I Exploiting Similarities among Languages for Machine Translation I Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation? I Retrofitting word vectors to semantic lexicons I (vision, text, knowledge graph) Cross-modal Knowledge Transfer: Improving the word embeddings of Apple by Looking at Oranges I Disentangled Representations for Manipulation of Sentiment of Text I unfortunately, this is a bad movie that is just plain bad I overall, this is a good movie that is just good I Deep manifold traversal: Changing labels with convolutional features I (vision) Transform a smiling portrait into an angry one and make one individual look more like someone else without changing clothing and background
I Controllable Text generation I Learning to generate reviews and discovering sentiment I A neural algorithm of artistic style
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
23 / 45
Outline 1
About Me
2
Introduction and Motivation
3
Properties of Algorithm for Learning Representation
4
Representation Learning of Nodes in an Evolving Network
5
Representation Learning of Sentences
6
Substructure Sampling
7
Total Recall
8
Name Disambiguation
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
24 / 45
Motivation (Latent Representation of Sentences)
I
Most existing Sen2Vec methods disregard context of a sentence
I
Meaning of one sentence depends on the meaning of its neighbors I I I
I eat my dinner. Then I take some rest. After that I go to bed.
I
Our approach: incorporate extra-sentential context into Sen2Vec
I
We propose two methods: regularization and retrofitting
I
We experiment with two types of context: discourse and similarity.
I Regularized and Retrofitted models for Learning Sentence Representation with Context I CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
25 / 45
Motivation (Discourse is Important) I A simple strategy of decoding the concatenation of the previous and current sentence leads to good performance I A novel strategy of multiencoding and decoding of two sentences leads to the best performance I Target side context is important in translation
I Evaluating Discourse Phenomena in Neural Machine Translation Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
26 / 45
Our Approach
I I
Consider content as well as context of a sentence Treat the context sentences as atomic linguistic units I I
Similar in spirit to (Le & Mikolov, 2014) Efficient to train compared to compositional methods like encoder-decoder models (e.g., SDAE, Skip-Thought)
I Sen2Vec, SDAE, SAE, Fast-Sent, Skip-Thought, w2v-avg, c-phrase
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
27 / 45
Content Model (Sen2Vec) I
Treats sentences and words similarly
I
Represented by vectors in shared embedding matrix
I
v: I eat my dinner I
eat
my
dinner
φ : V → Rd look-up v
Figure: Distributed bag of words or DBOW (Le & Mikolov, 2014)
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
28 / 45
Regularized Models (Reg-dis, Reg-sim) I
Incorporate neighborhood directly into the objective function of the content-based model (Sen2Vec) as a regularizer
I
Objective function: J(φ) =
Xh
i Lc (v) + β Lr (v, N(v))
v∈V
=
Xh v∈V
i X ||φ(u) − φ(v)||2 Lc (v) + β | {z } (v,u)∈E Content loss | {z }
(8)
Graph smoothing
I
Train with SGD
I
Regularization with discourse context ⇒ Reg-dis
I
Regularization with similarity context ⇒ Reg-sim
I CON-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
29 / 45
Pictorial Depiction
u : And I was wondering about the GD LEV.
is
it
reusable
is
it
Lc
reusable
Lc Lr
φ
y : Or is it discarded to burn up on return to LEO?
v
v
(b) Sen2Vec (DBOW)
(c) Reg-dis
(a) A sequence of sentences
u
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
φ
Lr
v : Is it reusable?
y
May 13, 2018
30 / 45
Retrofitted Model (Ret-dis, Ret-sim) I
Retrofit vectors learned from Sen2Vec s.t. the revised vector φ(v): I I
I
Similar to the prior vector, φ0 (v) Similar to the vectors of its neighboring sentences, φ(u)
Objective function: X X βu,v ||φ(u) − φ(v)||2 J(φ) = αv ||φ(v) − φ0 (v)||2 + {z } | v∈V (v,u)∈E close to prior | {z }
(9)
graph smoothing
I
Solve using Jacobi iterative method
I
Retrofit with discourse context ⇒ Ret-dis
I
Retrofit with similarity context ⇒ Ret-sim
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
31 / 45
Outline 1
About Me
2
Introduction and Motivation
3
Properties of Algorithm for Learning Representation
4
Representation Learning of Nodes in an Evolving Network
5
Representation Learning of Sentences
6
Substructure Sampling
7
Total Recall
8
Name Disambiguation
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
32 / 45
Frequent Subgraph Mining (Sampling Substructure) I Perform a first-order random-walk over the fixed-size substructure space I MH algorithm calculates acceptance probability using the following equation:
α(x, y ) = min
! π(y )q(y , x) ,1 π(x)q(x, y )
(10)
I For mining frequent substructure from a set of graphs, we use average (s1 ) and set interaction support (s2 ) as the target distibution, i.e., π = s1 or π = s2 I For collecting statistics from a single large graph, we use uniform probabililty distribution as our target distribution I In both cases, we use uniform distribution as our proposal distribution, i.e., q(x, y ) = 1 dx
I F-S-Cube: A sampling based method for top-k frequent subgraph mining I Finding network motifs using MCMC sampling I Discovery of Functional Motifs from the Interface Region of Oligomeric Proteins using Frequent Subgraph Mining I ACTS: Extracting android App topological signature through graphlet sampling
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
33 / 45
Data Representation (Frequent Subgraph Mining) Graph Database
A
A
B C
F
D
D
D
B B
E
G1
E C
G2
D G3
Frequent Induced Subgraphs
A
B
B
B
D
A
B
C
D
E
E
B C
2-node frequent subgraphs
B
D
E
3-node frequent subgraphs
I We find the support-set of edges BD, BE and DE of g13 which are {G1 , G2 , G3 }, {G2 , G3 }, and {G2 , G3 } respectively I So, for gBDE , s1 (gBDE ) = 3+2+3 = 2.67, and s2 (gBDE ) = 2 3
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
34 / 45
Frequent Subgraph Mining (Sampling Substructure)
6
6
1 5,6,7,8,9,10 1 12
1
2
5
1
2 5,6,7,8,10
7
9
12 11
3
4 8
(a)
9
11 3
10
4 8
3
4,9
8 4,5,6,9 10
(a)
(b)
(a) Left: A graph G with the current state of random walk; Right: Neighborhood information of the current state (1,2,3,4)
2 4,5,6,9,12
7
3 5,6,7,8,9,10 4 5,6,8,9
4,9
2
5
(b)
(b) Left: The state of random walk on G (Figure 8a) after one transition; Right: Updated Neighborhood information
Figure: Neighbor generation mechanism
I
For this example, dx = 21, dy = 13
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
35 / 45
Frequent Subgraph Mining (Sampling Substructure) Algorithm 1: SampleIndSubGraph Pseudocode Input : - Graph Gi - Size of subgraph, ` [2]x ← State saved at Gi ; [4]dx ← Neighbor-count of x ; [6]a supx ← score of graph x ; [8]while a neighbor state y is not found do [10] y ← a random neighbor of x; [12] dy ← Neighbor count of y ; [14] a supy ← score of graph y ; [16] accp val ← (dx ∗ a supy )/(dy ∗ a supx ) ; [18] accp probablility ← min(1, accp val) ; [20] if uniform(0, 1) ≤ accp probability then [22] return y ;
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
36 / 45
Frequent Subgraph Mining (Sampling Substructure) Algorithm 2: SampleIndSubGraph Pseudocode Input : - Graph Gi - Size of subgraph, ` [2]x ← State saved at Gi ; [4]dx ← Neighbor-count of x ; [6]a supx ← score of graph x ; [8]while a neighbor state y is not found do [10] y ← a random neighbor of x; [12] dy ← Neighbor count of y ; [14] a supy ← score of graph y ; [16] accp val ← dx /dy ; [18] accp probablility ← min(1, accp val) ; [20] if uniform(0, 1) ≤ accp probability then [22] return y ;
I Motif Counting Beyond Five Nodes Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
37 / 45
Frequent Subgraph Mining (Sampling Substructure) I The random walks are ergodic I It satisfies reversibility condition, so, it achieves the target distribution I We have use spectral gap (λ = 1 − max{λ1 , |λm−1 |}) technique to measure the mixing rate of our random walk I We compute the mixing time (inverse of spectral gap) for size-6 subgraphs of Mutagen dataset and found that the mixing time is approximately around 15 units I We suggest to use multiple chains along with a suitable distance measure (for example, jaccard distance) for choossing a suitable iteration count I We show that the acceptance probability for our technique is quite high (a large number of rejected moves indicate a poorly designed proposal distribution)
Table: Probability of Acceptance of FS3 for Mutagen and PS Dataset Mutagen
Acceptance (%), Strategy =s1 Acceptance (%), Strategy =s2
PS
`= 8
`=9
`=10
`=6
`=7
`=8
82.70 ± 0.04 75.27 ± 0.05
83.89 ± 0.03 76.74 ± 0.03
81.66 ± 0.03 75.20 ± 0.03
91.08 ± 0.01 85.08 ± 0.05
92.23 ± 0.02 87.46 ± 0.06
93.08 ± 0.01 89.41 ± 0.07
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
38 / 45
Outline 1
About Me
2
Introduction and Motivation
3
Properties of Algorithm for Learning Representation
4
Representation Learning of Nodes in an Evolving Network
5
Representation Learning of Sentences
6
Substructure Sampling
7
Total Recall
8
Name Disambiguation
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
39 / 45
The problem of Total Recall
I Vanity search: Find out everything about me Fandom: find out everything about my hero I Research: Find out everything about my PhD topic I Investigation: Find out everything about something or some activity I Systematic review: Find all published studies evaluating some method or effect Patent search: find all prior art I Electronic discovery: Find all documents responsive to a request for production in a legal matter I Creating archival collections: Label all relevant documents, for posterity, future IR evaluation, etc.
I Batch-mode active learning for technology-assisted review I A large scale study of SVM based methods for abstract screening in systematic reviews
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
40 / 45
An Active Learning Algorithm Algorithm 3: SelectABatch Input : hc , current hyperplane; D, available instances; k, batch size; and similarity threshold, t Output: A batch of k documents to be included in training 1 2 3 4 5 6 7
if Strategy is DS then Bc ← EmptySet() I ← ArgSort (Distance(hc , D), order = increase) while Size (Bc ) < k do Insert(Bc , I [1]) S ← GetSimilar(I [1], I , D, t, similarity = cosine) I ← Remove(I , S)
17
else if Strategy is BPS then w ← 1.0/(Distance(hc , D)2 w ← Normalize(w) I ← List(D) while Size (Bc ) < k do c ← Choose(I , prob = w, num = 1) Insert(Bc , c) S ← GetSimilar(c, I , D, t, similarity = cosine) I ← Remove(I , S) w ← Normalize(w[I ])
18
return Bc
8 9 10 11 12 13 14 15 16
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
41 / 45
Outline 1
About Me
2
Introduction and Motivation
3
Properties of Algorithm for Learning Representation
4
Representation Learning of Nodes in an Evolving Network
5
Representation Learning of Sentences
6
Substructure Sampling
7
Total Recall
8
Name Disambiguation
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
42 / 45
Name Disambiguation I The graph in this figure corresponds to the ego network of u, Gu I We also assume that u is a multi-node consisting of two name entities I So the removal of the node u (along with all of its incident edges) from Gu makes two disjoint clusters
Figure: A toy example of clustering based entity disambiguation
I (i) A good representation must disentangle the underlying factors of variation in the training data (ii) Is it even neecessary to do anything but maximize likelihood under a good model? Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
43 / 45
Name Disambiguation I Calculate Normalized-cut score NC =
k X
W (Ci , Ci )
i=1
W (Ci , Ci ) + W (Ci , Ci )
(11)
Number of Papers
I Modeling Temporal Mobility
3 2 1 0
Cluster3
4 3 2 1 0
Cluster2
4 3 2 1 0
Cluster1
I Calculating Temporal Mobility score 13 k−1 X
TM-score =
k X
12
11
10
09
w (Zi , Zj ) ·
07
06
05
D(Zi k Zj ) + D(Zj k Zi )
i=1 j=i+1
k×
Pk−1 Pk i=1
j=i+1
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
(12)
w (Zi , Zj ) May 13, 2018
44 / 45
Thanks!
I Sen2Vec Code and Datasets: https://github.com/tksaha/con-s2v/tree/jointlearning I Temporal node2vec Code: https://gitlab.com/tksaha/temporalnode2vec.git I Motif Finding Code: https://github.com/tksaha/motif-finding I Frequent Subgraph Mining Code: https://github.com/tksaha/fs3-graph-mining I Finding Functional Motif Code: https://gitlab.com/tksaha/func motif
Tanay Kumar Saha (My Research Presentation) Latent Representation and Sampling
May 13, 2018
45 / 45