towards the use of deep reinforcement learning with

Recommend Documents

Towards the Use of Deep Reinforcement Learning with Global Policy

Nov 14, 2017 - 1 Introduction. Common supervised machine learning approaches ... vidual sentences, the final evaluation of the system compares the output ...

Towards Deep Symbolic Reinforcement Learning

Oct 1, 2016 - Central to classical AI is the use of language-like propositional ... But as an approach to general intelligence, classical symbolic AI has been ...

Towards Cognitive Exploration through Deep Reinforcement Learning ...

Oct 6, 2016 - deep-learning and cognitive recognition, have taken the state- of-the-art place in .... robot-arms [16] and controlling of helicopters [17]. Through ..... [2] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.. Bellemare .

Relational Deep Reinforcement Learning

Jun 28, 2018 - This mechanism has parallels with graph neural networks ... We use a convolutional neural network (CNN) to parse pixel inputs into k.

Collaborative Deep Reinforcement Learning

Feb 19, 2017 - using extensive empirical evaluation on OpenAI gym. CCS CONCEPTS ... a given environment and using rewards from the environment as.

Generating Text with Deep Reinforcement Learning

Oct 30, 2015 - portant applications include speech recognition, machine translation, ...... [21] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. ...

Continuous control with deep reinforcement learning

Sep 9, 2015 - Continuous control with deep reinforcement learning. Timothy P. Lillicrap*. Jonathan J. Hunt*. Alexander Pritzel. Nicolas Heess. Tom Erez.

Deep Reinforcement Learning with Macro-Actions

Jun 15, 2016 - [20] Ziyu Wang, Nando de Freitas, and Marc Lanctot. Dueling network architectures for deep reinforcement learning. arXiv preprint ...

Resource Management with Deep Reinforcement Learning

Nov 10, 2016 - vances in deep reinforcement learning for AI problems, we consider building ...... Communications of the ACM, (4), 2010. [7] D. P. Bertsekas ...

Playing Atari with Deep Reinforcement Learning

Dec 19, 2013 - We present the first deep learning model to successfully learn control policies directly from high-dime

Deep Reinforcement Learning with Surrogate Agent-Environment ...

Sep 12, 2017 - Q-Learning. In AAAI, pages 2094â2100, 2016. [10] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas.

Robust Deep Reinforcement Learning with Adversarial Attacks

Dec 11, 2017 - Deep deterministic policy gradient (DDPG) (Lillicrap et al. [2015]) ..... Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel.

Continuous control with deep reinforcement learning

Sep 9, 2015 - petitive with those found by a planning algorithm with full access to the dynamics of the domain and its ... Many tasks of interest, most notably physical control tasks, have ..... âAdaptive critic designsâ. In: Neural Networks,.

A Deep Reinforcement Learning Chatbot

Sep 7, 2017 - We present MILABOT: a deep reinforcement learning chatbot developed by the. Montreal Institute for Learning Algorithms (MILA) for the ...

Distributed Deep Reinforcement Learning using

environments from OpenAI Gym, the agent outperforms a decent human reference player with few days of training. Keywordsâ Deep Reinforcement Learning, ...

Towards Vision-Based Deep Reinforcement Learning for Robotic ...

Nov 13, 2015 - issues encountered, i.e., the differences between training and testing settings. ... motor policies) and its guided policy search method ..... or RGBD sensor can be a more effective and practical solution ... Mechanics of robotic.

Towards Vision-Based Deep Reinforcement Learning for Robotic

Nov 13, 2015 - dures, i.e., pose CNN training, trajectories pre-training, and end-to-end training. The deep visuomotor policies did enable robots to.

Deep Reinforcement Learning with Double Q-learning - arXiv

Dec 8, 2015 - this setting is a best-case scenario for Q-learning, because the deep neural ..... Figure 2: Illustration of overestimations during learning. In each ...

Towards Experience-Efficient Reinforcement Learning

Jan 4, 2019 - Efficient Reinforcement Learningâ and the work presented in it are my own. I confirm that: ..... similarity metric (indicated over each SOM element). ... when periodic (whenever total population exceeded 106 agents) extinc- .... A pre

Towards Deep Representation Learning with Genetic Programming

Feb 20, 2018 - Rainer Storn and Kenneth Price. Differential evolutionâa simple and ... Leonardo Trujillo and Gustavo Olague. Synthesis of interest point ...

Towards deep learning with segregated dendrites - eLife

Dec 5, 2017 - capacitance (see Materials and methods, Equation (16)). .... Similar to how targets are used in deep supervised learning (LeCun et al., 2015), ...

Towards Interactive Relational Reinforcement Learning of Concepts

We present a framework for the interactive machine learning of denotational con- .... 701. 801. 901. 0,50. 0,60. 0,70. 0,80. 0,90. 1,00. 1,10. 1,20. 1. 101. 201. 301.

Deep Reinforcement Learning with Attention for Slate Markov ...

Dec 16, 2015 - Google DeepMind. London UK [email protected]. December 17 ... Unlike existing methods, we optimize for both the com- binatorial and ...

Playing Doom with SLAM-Augmented Deep Reinforcement Learning

cumulative, indicating both type and position. For the majority of scenarios that ... AI] 1 Dec 2016 ..... file), Ëui = [ui,vi, 1] denote image pixels in homogeneous.

towards the use of deep reinforcement learning with

Download PDF

0 downloads 0 Views 249KB Size Report

Comment

Supervised machine learning approaches to text summarisation are usually based ... Learn a global policy using policy gradient. CONTACT ... [2] AurÃ©lien GÃ©ron. 2017. Hands-on Machine. Learning with Scikit-Learn and TensorFlow:: Con-.

T OWARDS THE U SE OF D EEP R EINFORCEMENT L EARNING WITH G LOBAL P OLICY FOR Q UERY- BASED E XTRACTIVE S UMMARISATION D IEGO M OLLÁ P ROBLEM

A CTIONS , R EWARD

• Supervised machine learning approaches to text summarisation are usually based on predicted scores of individual sentences/extracts. • The resulting system is therefore not optimised to the global multi-sentence summary.

1. The agent needs to decide whether sentence i is part of the summary or not. 2. Reward is delayed until all sentences have been processed. 3. Reward of the summary is its ROU GEL score. 0 if i < n r= ROU GEL if i = n

C ONTRIBUTIONS 1. Focus on query-based extractive summarisation. 2. Use reinforcement learning to directly optimise the final multi-sentence summary. 3. Learn a global policy using policy gradient.

R EINFORCEMENT L EARNING Agent s, r, done

a

Environment • a: Action made by the agent.

T HE G LOBAL P OLICY 2. The global policy predicts the probability that selecting sentence i would give the highest reward. 3. Implemented as a neural network with one hidden layer. • The final unit is a Bernoulli logistic unit. 4. Trained using policy gradient. o σ(h · Wh + bh ) relu(s · Ws + bs )

• r: Reward given to the action. • s: State returned after applying the action.

The learning algorithm for the global policy is a variant of the REINFORCE algorithm [1] that uses gradient descent with cross-entropy gradients that are multiplied with the reward [2, Chapter 16]. Data: train_data Result: θ = (Wh , bh , Ws , bs ) sample ∼ U nif orm(train_data); s ← env.reset(sample); all_gradients ← ∅; episode ← 0; while True do ξ ∼ Bernoulli

1. Use a training set to learn a global policy.

P r(a = 0|s; Wh , Ws , bh , bs ) = o = h =

E ARLY E XPLORATION

P OLICY G RADIENT

R ESULTS O N B IO ASQ 5 B D ATA

P r(a=0)+p 1+2×p

• We encourage exploration in the initial steps. – Exploration diminishes in later steps. • This way we may avoid locking in local minima early. P r(a = 0) + p ξ ∼ Bernoulli 1+2×p 3000 p = 0.2 3000 + episode

;

y ← 1 − ξ; ∇(cross_entropy(y,P r(a=0)) ; ∇θ

gradient ← all_gradients.append(gradient); s, r, done ← env.step(ξ); episode ← episode + 1; if done then θ ← θ−α×r×mean(all_gradients); sample ∼ U nif orm(train_data); s ← env.reset(sample); all_gradients ← ∅; end end

T HE S TATE • The state should contain all the information needed to choose sentence i. • The state must encode an arbitrary number of sentences. • The sentences are processed sequentially and decisions cannot be undone. 1. tf.idf of the candidate sentence i. 2. tf.idf of the entire input text to summarise. 3. tf.idf of the summary generated so far. 4. tf.idf of the candidate sentences that are yet to be processed. 5. tf.idf of the question.

• done: T rue if an episode has completed.

S OURCE C ODE https://github.com/dmollaaliod/ bioasq-rl

C ONTACT • http://comp.mq.edu.au/~diego/ • [email protected]

R EFERENCES [1] Ronald J. Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8:229–256. [2] Aurélien Géron. 2017. Hands-on Machine Learning with Scikit-Learn and TensorFlow:: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media.