Pathologies of Approximate Policy Iteration in Dynamic ... - MIT

Recommend Documents

where G vk is a greedy policy w.r.t. vk, TÏk is the Bell- man operator associated to ..... this operator is a Î³-contraction in max-norm (Bert- sekas & Tsitsiklis, 1996 ...

Rollout Sampling Approximate Policy Iteration

Jul 6, 2008 - ... algorithms have been proposed for learning good or even optimal policies Sutton and Barto (1998). 3 ...... Richard Sutton and Andrew Barto.

Approximate Policy Iteration: A Survey and Some New Methods - MIT

Abstract. We consider the classical policy iteration method of dynamic programming (DP), where ... and chattering, and optimistic and distributed policy iteration.

Approximate Policy Iteration for Markov Control Revisited.

... and has hence attracted a great deal of interest within the control community. ... (API) has generated much interest recently within the RL community [7, 3]. ..... Athena Scientific, Nashua, NH. 4. ... Ph.D. thesis, Kings College, Cambridge, UK.

Approximate Modified Policy Iteration and its ...

Dec 8, 2014 - Approximate Modified Policy Iteration and its Application to the Game of Tetris. Journal of. Machine Learning Research, Microtome Publishing, ...

Approximate Policy Iteration for Semiconductor ... - Semantic Scholar

Belmont, Massachusetts, 1995. [5] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont,. Massachusetts, 1996. [6] A. M. ...

Approximate Policy Iteration for Markov Control ... - CyberLeninka

iteration [1] and policy iteration [2] are two popular DP methods and have been used ... Step 1: Simulate a policy and update its value function as follows after the ...

Regularized Approximate Policy Iteration using kernel ...

In John D. Lafferty, Christopher K. I.. Williams, John Shawe-Taylor ...... problem of online SVM learning has been found by Cauwenberghs and Poggio [70] where.

Rollout sampling approximate policy iteration - Google Sites

Jun 22, 2008 - strated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car

A Convergent Form of Approximate Policy Iteration - NIPS Proceedings

Theodore J. Perkins. Department of ... processes, Perkins and Pendrith [10] showed that observation-action values that are fixed ..... MIT Press/Bradford. Books ...

Separable Dynamic Programming and Approximate ... - MIT

Science, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. (e-mail: ...... Belmont, MA: Athena Scientific, 2007, vol. II. [9] F. Blanchini, âSet ...

Approximate Policy Iteration for Budgeted Semantic Video Segmentation

Jul 26, 2016 - all descriptors for all supevoxels is costly. Thus, a sparse section of supervoxels and choosing their most useful and least costly descriptors is ...

Hierarchical Approximate Policy Iteration with Binary ... - IEEE Xplore

To address this problem, this paper presents a hierarchical API (HAPI) method with binary-tree state space decomposition for RL in a class of absorbing MDPs, ...

Approximate Policy Iteration: A Survey and Some New Methods

The vector r can be viewed as a cost vector of an aggregate problem that has s states ... or (1.7) is solved approximately, with a finite number of iterations of some ...... to C(Î») and d(Î»), respectively, are obtained from the following recursive

Sigma Point Policy Iteration

random MDPs, the chain walk, the inverted pendulum, and a con- tinuous maze .... formed well in terms of wall clock time: to compute a policy with. 100,000 ...

Representation Policy Iteration - arXiv

Representation Policy Iteration. Sridhar Mahadevan. Department of Computer Science. University of Massachusetts. 140 Governor's Drive. Amherst, MA 01003.

Approximate Value Iteration in the Reinforcement ... - Semantic Scholar

In N. Lavra, L. Gamberger, and L. Todorovski, editors, Proceedings of the 14th European Conference on Machine Learn- ing, pages 96â107, Dubrovnik, Croatia, ...

Approximate dynamic programming for

high value, low volume spare parts which must be available to respond to ..... taj : total no. of parts with attribute aj Ñthat need replacement under LOS jÐ® at time t ..... in Barnhart, C. and Laporte, G. (Eds), Handbooks in Operations Research.

Value iteration and policy iteration algorithms for Markov decision ...

Apr 18, 1996 - iteration (PI) [Howard, 1960]), are compared on a class of problems from the motion planning domain. Policy iteration and its modi cations are ...

Value iteration and policy iteration algorithms for Markov decision ...

Apr 18, 1996 - there are four available actions, North, South, East and West, that .... versions of PI on the motion planning domain, and check if the policy.

Convergence Properties of Policy Iteration - Editorial Express

Abstract. This paper analyzes asymptotic convergence properties of policy iteration in a class of stationary, infinite-horizon Markovian decision problems that ...

Offline-Online Approximate Dynamic Programming for Dynamic

Dec 3, 2015 - of using offline and online methods in tandem as a hybrid ADP procedure, making possi- ...... high Î». Darker shades indicate higher expected rewards. Table 4 .... GoodsonRolloutFramework.pdf, Accessed on June 18, 2015.

PATHOLOGIES

Une étude des pathologies d'un ouvrage se décompose en plusieurs phases. .... Etude des désordres affectant un bâtiment ou un matériau en oeuvre; une ...

On Approximate Dynamic Programming in Switching Systems

many respects as challenging as optimal control of gen- eral nonlinear or hybrid systems. ... dynamic programming [3], the need for approximate solutions was ...

Pathologies of Approximate Policy Iteration in Dynamic ... - MIT

Download PDF

0 downloads 0 Views 2MB Size Report

Comment

Pathologies of Approximate Policy Iteration in Dynamic. Programming. Dimitri P. Bertsekas. Laboratory for Information and Decision Systems. Massachusetts ...

Pathologies of Approximate Policy Iteration in Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology

March 2011

Summary We consider policy iteration with cost function approximation Used widely but exhibits very complex behavior and a variety of potential pathologies Case of the tetris test problem Two types of pathologies Deterministic: Due to cost function approximation Stochastic: Due to simulation errors/noise

We survey the pathologies in Policy evaluation: Due to errors in approximate evaluation of policies Policy improvement: Due to policy improvement mechanism

Special focus: Policy oscillations and local attractors Causes of the problem in TD/projected equation methods: The projection operator may not be monotone The projection norm may depend on the policy evaluated

We discuss methods that address the difficulty

References

D. P. Bertsekas, “Pathologies of Temporal Differences Methods in Approximate Dynamic Programming," Proc. 2010 IEEE Conference on Decision and Control, Proc. 2010 IEEE Conference on Decision and Control, Atlanta, GA. D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. II, 2007, Supplementary Chapter on Approximate DP: On-line; a “living chapter."

MDP: Brief Review J ∗ (i) = Optimal cost starting from state i Jµ (i) = Cost starting from state i using policy µ Denote by T and Tµ the DP mappings that transform J ∈