Efficient Reinforcement Learning with Multiple Reward ... - ijcai-11
Recommend Documents
Instead, machine learning techniques may be ..... strategies typically first focus on learning about the true envi- ..... Superintelligence: Paths, Dangers, Strategies.
a set of novel learning problems that arise in this framework, .... Learning Problems and Analysis ..... International C
Efficient Reinforcement Learning with Relocatable Action Models. Bethany R. Leffler ..... number of transition samples needed to estimate probabili- ties). At each ...
In a typical reinforcement learning (RL) setting details of the environment are ... functions to posteriors by observing samples from the MDP [4, 5]. Ghavamzadeh.
Feb 22, 2018 - Sanket Kamthe, Marc Peter Deisenroth to the RBF policy. This allows ...... Press, 2003. [27] A. Y. Ng and M. I. Jordan. PEGASUS: A Policy.
loads and locations of the trucks at any time are known to the scheduling ... going to a different location, unloading, and waiting. Since ... states, a one-step look-ahead search over all available ac- tions of all .... Here, we can see that shop in
Efficient Distributed Reinforcement Learning. Through Agreement. Paulina Varshavskaya, Leslie Pack Kaelbling and Daniela Rus. Abstract Distributed robotic ...
Nov 22, 2010 - I am deeply grateful to my supervisor Dr. Carl Edward Rasmussen for his ... 3.2 Model Bias in Model-based Reinforcement Learning . .... Lists of Figures, Tables, and Algorithms ...... parameters as recommended by MacKay (1999). ......
Jan 4, 2019 - Efficient Reinforcement Learningâ and the work presented in it are my own. I confirm that: ..... similarity metric (indicated over each SOM element). ... when periodic (whenever total population exceeded 106 agents) extinc- .... A pre
history). Ï(x, a) is the policy or probability the agent will take action a when ..... Kodak Company, Daimler-Chrysler, Digital Equipment Corporation, Honda R&D ...
efficient neural networks algorithms: trust region actor-critic with experience replay. (TRACER) and episodic natural actor- critic with experience replay (eNACER).
Reinforcement learning (RL) [1] is an attractive framework for the addressed .... A key for efficient exploration of the state space is the generation of sample states ...
Abstract. We present a reinforcement learning ap- proach to learning a single, non-hierarchical ... and/or a deep understand- ... Hence, most model-based RL methods assume a pre- .... In the following, we summarize the main concepts.
difficulty adjustment is a multiple-objective problem, in which ... cesses in order to mediate between perception and action. This ..... to a game property. The first ...
Jul 1, 2017 - 2015a; Vandyke et al., 2015; Su et al., 2016b). To allow the system to ...... 683â691. Nicholas Roy, Joelle Pineau, and Sebastian Thrun. 2000.
Jun 8, 2016 - Retrace(λ) can learn from full returns retrieved from past policy data, as in the context of experience r
than humans or animals when learning motor control tasks in the absence of expert ..... makes myopic policies fail. In the following, we exactly .... International Conference on Machine Learning, pages 1â8, Pittsburgh,. PA, USA, June 2006.
that basis selects an action, at 2 A(st), where A(st) is the set of actions available on state st. One step later, in part as a consequence of its actions, the agent ...
Sep 8, 2017 - We thank Nicolas Heess and Josh Merel from DeepMind for insightful discussions. Furthermore, we thank the TensorFlow team and community ...
Jun 8, 2016 - its degree of âoff-policynessâ; and (3) efficiency, as it makes the best use of sam- .... we informall
agreement algorithm to efficiently exchange local rewards and experience among ..... Acknowledgements The authors gratefully acknowledge the support of The ...
Sample-efficient Reinforcement Learning via Difference Models. Divyam Rastogiâ,1, Ivan Koryakovskiyâ,2 and Jens Kober3. AbstractâTo render learning ...
Reinforced Imitation: Sample Efficient Deep Reinforcement Learning for. Map-less Navigation by Leveraging Prior Demonstrations. M. Pfeiffer1â, S. Shukla2â, ...
Efficient Reinforcement Learning with Multiple Reward ... - ijcai-11
Partially observable Markov decision processes (POMDPs) [Kaelbling98]. â« Modeling sequential decision making under partial or uncertain observations.
Point-Based Value Iteration for Constrained POMDPs
Dongho Kim Jaesong Lee Kee-Eung Kim Department of Computer Science
IJCAI-2011 2011. 7. 22.
Pascal Poupart School of Computer Science
Motivation goals
action Agent
observation
Environment
• Partially observable Markov decision processes (POMDPs) [Kaelbling98] Modeling sequential decision making under partial or uncertain observations
Single reward function encodes the immediate utility of executing actions. Required to manually balance different objectives into the single reward function
• Constrained POMDPs (CPOMDPs) Problems with limited resource or multiple objectives Maximizing one objective (reward) while constraining other objectives (costs) CPOMDP has not received as much attention as CMDPs. [Altman99] • Exception: DP method for finding deterministic policies [Isom08] Dongho Kim
2
Motivation • Resource-limited agent, e.g., battery-equipped robot Accomplish as many goals as possible given a finite amount of energy
• Spoken dialogue system [Williams07] e.g., minimize length of dialogue while guaranteeing 95% dialogue success rate
Reward : -1 for each dialogue turn Cost : +1 for each unsuccessful dialogue, 0 for each successful dialogue Dialogue :
…
𝑠0
𝑠1
𝑠2
𝑅 = −1 𝐶=0
𝑅 = −1 𝐶=0
𝑅 = −1 𝐶=0
• Goal: maximize 𝔼
𝑡𝛾
𝑡𝑟 𝑡
s.t. 𝔼
𝑡𝛾
𝑡𝑐 𝑡
𝑠𝑇 𝑅 = −1 𝐶 = +1 for unsuccessful dialogue 𝐶 = 0 for successful dialogue
≤𝑐
• We propose exact and approximate methods for solving CPOMDPs. Dongho Kim