Efficient Reinforcement Learning with Multiple Reward ... - ijcai-11

Recommend Documents

Reinforcement Learning with a Corrupted Reward

Instead, machine learning techniques may be ..... strategies typically first focus on learning about the true envi- ..... Superintelligence: Paths, Dangers, Strategies.

Efficient Reinforcement Learning with Relocatable Action ... - Research

a set of novel learning problems that arise in this framework, .... Learning Problems and Analysis ..... International C

Efficient Reinforcement Learning with Relocatable Action ... - Research

Efficient Reinforcement Learning with Relocatable Action Models. Bethany R. Leffler ..... number of transition samples needed to estimate probabili- ties). At each ...

Efficient Uncertainty Propagation for Reinforcement Learning with

In a typical reinforcement learning (RL) setting details of the environment are ... functions to posteriors by observing samples from the MDP [4, 5]. Ghavamzadeh.

Data-Efficient Reinforcement Learning with Probabilistic Model ...

Feb 22, 2018 - Sanket Kamthe, Marc Peter Deisenroth to the RBF policy. This allows ...... Press, 2003. [27] A. Y. Ng and M. I. Jordan. PEGASUS: A Policy.

Scaling Average-reward Reinforcement Learning for ... - CiteSeerX

loads and locations of the trucks at any time are known to the scheduling ... going to a different location, unloading, and waiting. Since ... states, a one-step look-ahead search over all available actions of all .... Here, we can see that shop in

Efficient Distributed Reinforcement Learning ... - People.csail.mit.edu

Efficient Distributed Reinforcement Learning. Through Agreement. Paulina Varshavskaya, Leslie Pack Kaelbling and Daniela Rus. Abstract Distributed robotic ...

Efficient Reinforcement Learning using Gaussian

Nov 22, 2010 - I am deeply grateful to my supervisor Dr. Carl Edward Rasmussen for his ... 3.2 Model Bias in Model-based Reinforcement Learning . .... Lists of Figures, Tables, and Algorithms ...... parameters as recommended by MacKay (1999). ......

Towards Experience-Efficient Reinforcement Learning

Jan 4, 2019 - Efficient Reinforcement Learningâ and the work presented in it are my own. I confirm that: ..... similarity metric (indicated over each SOM element). ... when periodic (whenever total population exceeded 106 agents) extinc- .... A pre

Balancing Multiple Sources of Reward in Reinforcement ... - CiteSeerX

history). Ï(x, a) is the policy or probability the agent will take action a when ..... Kodak Company, Daimler-Chrysler, Digital Equipment Corporation, Honda R&D ...

Sample-efficient Actor-Critic Reinforcement Learning with Supervised ...

efficient neural networks algorithms: trust region actor-critic with experience replay. (TRACER) and episodic natural actor- critic with experience replay (eNACER).

Efficient Continuous-Time Reinforcement Learning with ... - TU Graz

Reinforcement learning (RL) [1] is an attractive framework for the addressed .... A key for efficient exploration of the state space is the generation of sample states ...

Multiple-Target Reinforcement Learning with a Single Policy - Core

Abstract. We present a reinforcement learning ap- proach to learning a single, non-hierarchical ... and/or a deep understand- ... Hence, most model-based RL methods assume a pre- .... In the following, we summarize the main concepts.

MPRL: Multiple-Periodic Reinforcement Learning for ...

difficulty adjustment is a multiple-objective problem, in which ... cesses in order to mediate between perception and action. This ..... to a game property. The first ...

Sample-efficient Actor-Critic Reinforcement Learning with Supervised ...

Jul 1, 2017 - 2015a; Vandyke et al., 2015; Su et al., 2016b). To allow the system to ...... 683â691. Nicholas Roy, Joelle Pineau, and Sebastian Thrun. 2000.

Safe and efficient off-policy reinforcement learning

Jun 8, 2016 - Retrace(Î») can learn from full returns retrieved from past policy data, as in the context of experience r

Efficient Reinforcement Learning for Motor Control

than humans or animals when learning motor control tasks in the absence of expert ..... makes myopic policies fail. In the following, we exactly .... International Conference on Machine Learning, pages 1â8, Pittsburgh,. PA, USA, June 2006.

Distributed Reinforcement Learning for Multiple ... - Semantic Scholar

that basis selects an action, at 2 A(st), where A(st) is the set of actions available on state st. One step later, in part as a consequence of its actions, the agent ...

TensorFlow Agents: Efficient Batched Reinforcement Learning in ...

Sep 8, 2017 - We thank Nicolas Heess and Josh Merel from DeepMind for insightful discussions. Furthermore, we thank the TensorFlow team and community ...

Safe and efficient off-policy reinforcement learning

Jun 8, 2016 - its degree of âoff-policynessâ; and (3) efficiency, as it makes the best use of sam- .... we informall

Safe and efficient off-policy reinforcement learning

Jun 8, 2016 - Google DeepMind. Anna Harutyunyan [email protected]. Vrije Universiteit Brussel. Marc G. Bellemare [email protected].

Efficient Distributed Reinforcement Learning Through Agreement

agreement algorithm to efficiently exchange local rewards and experience among ..... Acknowledgements The authors gratefully acknowledge the support of The ...

Sample-efficient Reinforcement Learning via Difference Models

Sample-efficient Reinforcement Learning via Difference Models. Divyam Rastogiâ,1, Ivan Koryakovskiyâ,2 and Jens Kober3. AbstractâTo render learning ...

Sample Efficient Deep Reinforcement Learning for

Reinforced Imitation: Sample Efficient Deep Reinforcement Learning for. Map-less Navigation by Leveraging Prior Demonstrations. M. Pfeiffer1â, S. Shukla2â, ...

Efficient Reinforcement Learning with Multiple Reward ... - ijcai-11

Download PDF

0 downloads 0 Views 784KB Size Report

Comment

Partially observable Markov decision processes (POMDPs) [Kaelbling98]. â« Modeling sequential decision making under partial or uncertain observations.

Point-Based Value Iteration for Constrained POMDPs

Dongho Kim Jaesong Lee Kee-Eung Kim Department of Computer Science

IJCAI-2011 2011. 7. 22.

Pascal Poupart School of Computer Science

Motivation goals

action Agent

observation

Environment

• Partially observable Markov decision processes (POMDPs) [Kaelbling98]  Modeling sequential decision making under partial or uncertain observations

 Single reward function encodes the immediate utility of executing actions.  Required to manually balance different objectives into the single reward function

• Constrained POMDPs (CPOMDPs)  Problems with limited resource or multiple objectives  Maximizing one objective (reward) while constraining other objectives (costs)  CPOMDP has not received as much attention as CMDPs. [Altman99] • Exception: DP method for finding deterministic policies [Isom08] Dongho Kim

2

Motivation • Resource-limited agent, e.g., battery-equipped robot  Accomplish as many goals as possible given a finite amount of energy

• Spoken dialogue system [Williams07]  e.g., minimize length of dialogue while guaranteeing 95% dialogue success rate

 Reward : -1 for each dialogue turn  Cost : +1 for each unsuccessful dialogue, 0 for each successful dialogue Dialogue :

…

𝑠0

𝑠1

𝑠2

𝑅 = −1 𝐶=0

𝑅 = −1 𝐶=0

𝑅 = −1 𝐶=0

• Goal: maximize 𝔼

𝑡𝛾

𝑡𝑟 𝑡

s.t. 𝔼

𝑡𝛾

𝑡𝑐 𝑡

𝑠𝑇 𝑅 = −1 𝐶 = +1 for unsuccessful dialogue 𝐶 = 0 for successful dialogue

≤𝑐

• We propose exact and approximate methods for solving CPOMDPs. Dongho Kim

3

Suboptimality of deterministic policies in CPOMDPs lazy lazy, 𝑝 = 0.9 𝑅 = 0, 𝐶 = 0 𝑅 = 0, 𝐶 = 0 lazy, 𝑝 = 0.1 𝑅 = 0, 𝐶 = 0 AdvisorHappy AdvisorAngry

Procrastinating student problem

work 𝑅=1 𝐶=1

work 𝑅=0 𝐶=1 JobDone

Reward and cost for work at each timestep

with prob. of 𝑐

𝑏0 = 1,0,0 𝛾