Variational Attention for Sequence-to-Sequence Models

0 downloads 0 Views 3MB Size Report
Aug 22, 2018 - E.g., machine translation, text summarization, dialog generation. Bahuleyan, Mou ... vanilla Seq2Seq models. Align source information on the.
Variational Attention for Sequence-to-Sequence Models Hareesh Bahuleyan1⇤ , Lili Mou2⇤ , Olga Vechtomova1 , Pascal Poupart1 1 University 2 AdeptMind

of Waterloo, ON, Canada Research, Toronto, Canada

August 22, 2018

Bahuleyan, Mou, Vechtomova, Poupart

Variational Attention

1 / 23

Overview

1

Introduction

2

Bypassing Phenomenon

3

Variational Attention

4

Experiments and Results

5

Conclusions

Bahuleyan, Mou, Vechtomova, Poupart

Variational Attention

2 / 23

Plan

1

Introduction

2

Bypassing Phenomenon

3

Variational Attention

4

Experiments and Results

5

Conclusions

Bahuleyan, Mou, Vechtomova, Poupart

Variational Attention

3 / 23

Sequence-to-Sequence Models

Encoder and Decoder are RNNs Hidden state initialization Teacher Forcing Predict one word at each timestep Bahuleyan, Mou, Vechtomova, Poupart

Variational Attention

4 / 23

Autoencoding (Deterministic) Obtain a compressed representation of the data x from which it is possible to re-construct it Encoder q (z|x) and Decoder p✓ (x|z) are jointly trained to maximize the conditional log-likelihood The latent representation z has an arbitrary distribution

Minimize Reconstruction Loss J=

PN

n=1

P|x (n) | t=1

(n)

(n)

log p(xt |z (n) , x

Suggest Documents