Efficient Attention using a Fixed-Size Memory Representation

Recommend Documents

Efficient Memory Representation of XML Document Trees

17 Apr 2007 ... needed to store the tree structure of the XML document. ..... node u ∈ N∗ of t, the pattern p matches t at u if there are trees t1,...,tk such that ... no possibility to extend m and m′ at the leafs to matches of some larger common.

Efficient Memory Representation of XML Documents - CiteSeerX

via, e.g., the DOM, suffer from huge memory demands: the space needed to load ...... A. Arion, A. Bonifati, G. Costa, S. D'Aguanno, I. Manolescu, and A. Pugliese.

A Memory-Efficient Representation of Explicit MPC Solutions

The memory size of explicit MPC solutions is primarily determined by ..... 0.5] u,. (9) where the states and inputs are constrained, respectively, by |xi| â¤ 5, i = 1, 2, ...

A MEMORY EFFICIENT PARALLEL

Key words. parallel partition method, tridiagonal linear systems, parallel ... On a serial computer, Gaussian elimination without pivoting can be used to solve.

Convex Decomposition And Efficient Shape Representation Using ...

each part with a convex polytope, we get a compact and geo- metrically more meaningful DNSM shape representation. arXiv:1606.07509v1 [cs.CV] 23 Jun 2016 ...

Iconic memory requires attention

May 7, 2012 - partial-report (Sperling, 1960; Averbach and Coriell, 1961); more recent tests have expanded the stimulus set to include col- ors (Houtkamp ...

Iconic memory requires attention

May 7, 2012 - Keywords: attention, iconic memory, consciousness. INTRODUCTION .... access consciousness to anterior regions (e.g., prefrontal cortex, see Lamme, 2006 ... among smooth distractor circles (i.e., no white bar), (Figure 1). Conversely, in

Memory-Efficient GroupBy-Aggregate using ... - Google Sites

Of course, this requires a lazy aggregator to be finalized, where all scheduled-but-not-executed aggregations are perfor

NEUROPSI ATTENTION AND MEMORY: A ... - Feggy Ostrosky

NEUROPSI ATTENTION AND MEMORY: A Neuropsychological. Test Battery in Spanish with Norms by Age and Educational Level. Feggy Ostrosky-Solıs and ...

Attention and Emotion-Enhanced Memory: A ...

Mar 2, 2018 - effect of attention on memory for emotional stimuli; and (ii) whether there is homogeneity between ...... Chapman et al., 2012; Humphreys et al., 2010; Koenig & Mecklinger, 2008; ...... Keightley, M. L., Winocur, G., Graham, S. J., Mayb

NEUROPSI ATTENTION AND MEMORY: A ... - Semantic Scholar

Test Battery in Spanish with Norms by Age and Educational Level. Feggy Ostrosky-SolÄ±s and .... history of chronic alcohol and/or drug abuse; and (3) normal or ...

Attention and learning and memory

attentional networks, and with activation in additional limbic regions involved in .... as well as the subjects' motivation to stabilize residual performance or regain ...

Attention shifts and memory averaging

Data Representation and Efficient Solution: A

Decision diagrams are a family of data structures that can compactly encode ... time and memory proportional to the product of the size (number of nodes) of.

Attention based Efficient Model Adaptation

benchmarks, the proposed attention based networks perform almost as well as ... Our model adaptation architecture, leverages the residual convolution neural ...

Memory-Efficient Topic Modeling

Jun 8, 2012 - collapsed Gibbs sampling (GS) (Griffiths and Steyvers, 2004), and loopy belief propagation. (BP) (Zeng et ...... Eric Gaussier and Cyril Goutte.

Attention, spatial representation and visual neglect - Psychology

Attention, spatial representations and visual neglect. 1. Contents .... conducted, SAIM demonstrated both unilateral neglect and spatial extinction, depending.

Attention modulates set representation by statistical properties ...

Attention modulates set representation by statistical properties. Authors; Authors and affiliations. Jan W. de FockertEmail author; Alexander P. Marchant. Article.

Efficient Representation of Fully Many-Body Localized Systems Using

May 8, 2017 - We propose a tensor network encoding the set of all eigenstates of a fully many-body localized system in .... phase, the unitary transformation U can be decomposed ... ring in a particular cluster in the expansion.

Efficient analysis and representation of

the analysis and representation of geophysical signals, as we show by example. ..... (16), however, the Slepian expression (17) has disentangled the ...

A hardware efficient control of memory addressing

[7] J. G. Proakis and D. G. Manolakis, Digital Signal Processing Principles,. Algorithms ... cent years. The improvement on algorithm and processor architecture .... c = Ac + N=4, then we have C0 = Re(C0) + j 1 Im(C0) = 0Im(C) + j 1 Re(C). ..... of c

Attention, Working Memory, and Long- Term Memory ...

Oct 25, 2013 - already Anderson 1983; Hebb 1949; James 1890). ..... the mechanisms of language processing (e.g., Acheson and MacDonald 2009; Dell et al.

using a memory card - CASIO

This camera does not support operation with a computer running .... See âInstalling the Software from the CD-ROMâ on

using a memory card - CASIO

the camera and then release it. This will cause ...... This software provides an extended tool set including a ... You n

Efficient Attention using a Fixed-Size Memory Representation

Download PDF

52 downloads 33 Views 4MB Size Report

Comment

Jul 1, 2017 - 2014; Wu et al., 2016), text summarization (Rush ..... size of 10 (Wiseman and Rush, 2016). ..... Richard S. Zemel, and Yoshua Bengio. 2015.

Efficient Attention using a Fixed-Size Memory Representation Denny Britz∗ and Melody Y. Guan∗ and Minh-Thang Luong Google Brain dennybritz,melodyguan,[email protected]

arXiv:1707.00110v1 [cs.CL] 1 Jul 2017

Abstract The standard content-based attention mechanism typically used in sequence-to-sequence models is computationally expensive as it requires the comparison of large encoder and decoder states at each time step. In this work, we propose an alternative attention mechanism based on a fixed size memory representation that is more efficient. Our technique predicts a compact set of K attention contexts during encoding and lets the decoder compute an efficient lookup that does not need to consult the memory. We show that our approach performs on-par with the standard attention mechanism while yielding inference speedups of 20% for real-world translation tasks and more for tasks with longer sequences. By visualizing attention scores we demonstrate that our models learn distinct, meaningful alignments.

1 Introduction Sequence-to-sequence models (Sutskever et al., 2014; Cho et al., 2014) have achieved state of the art results across a wide variety of tasks, including Neural Machine Translation (NMT) (Bahdanau et al., 2014; Wu et al., 2016), text summarization (Rush et al., 2015; Nallapati et al., 2016), speech recognition (Chan et al., 2015; Chorowski and Jaitly, 2016), image captioning (Xu et al., 2015), and conversational modeling (Vinyals and Le, 2015; Li et al., 2015). The most popular approaches are based on an encoder-decoder architecture consisting of two recurrent neural networks (RNNs) and an attention mechanism that aligns target to source tokens (Bahdanau et al., 2014; Luong et al., 2015). The typical attention mechanism used in these architectures computes a new attention context at each decoding ∗

Equal Contribution. Author order alphabetical.

step based on the current state of the decoder. Intuitively, this corresponds to looking at the source sequence after the output of every single target token. Inspired by how humans process sentences, we believe it may be unnecessary to look back at the entire original source sequence at each step.1 We thus propose an alternative attention mechanism (section 3) that leads to smaller computational time complexity. Our method predicts K attention context vectors while reading the source, and learns to use a weighted average of these vectors at each step of decoding. Thus, we avoid looking back at the source sequence once it has been encoded. We show (section 4) that this speeds up inference while performing on-par with the standard mechanism on both toy and real-world WMT translation datasets. We also show that our mechanism leads to larger speedups as sequences get longer. Finally, by visualizing the attention scores (section 5), we verify that the proposed technique learns meaningful alignments, and that different attention context vectors specialize on different parts of the source.

2 Background 2.1 Sequence-to-Sequence Model with Attention Our models are based on an encoder-decoder architecture with attention mechanism (Bahdanau et al., 2014; Luong et al., 2015). An encoder function takes as input a sequence of source tokens x=(x1,...,xm) and produces a sequence of states s=(s1,...,sm) .The decoder is an RNN that predicts the probability of a target sequence y =(y1,...,yT |s). The probability of each target token yi ∈ {1,...,|V |} is predicted based on the recurrent state in the decoder RNN, hi, the previous words, y