Fixed Points for Markov Decision Processes - TUM

Recommend Documents

2005 Jack L. King. Markov Decision Processes. Jack L. King, Ph.D. Genoa (UK) Limited. A Brief Introduction and Overview ...

Markov Processes in Isabelle/HOL - TUM

ing on R. The backward equation plays a central role in analyzing CTMCs, they .... probabilistic programming language se

Markov Processes in Isabelle/HOL - TUM

applied as models in queuing and reliability theory (Trivedi. 2002) as well as ..... starts in a state x, calls K x to r

One-Counter Markov Decision Processes

Sep 11, 2009 - The authors thank Petr Jancar, Richard Mayr, and Olivier Serre for pointing out the. PSPACE-hardness of the ... V. Shoup. A Computational ...

Characterizing Markov Decision Processes - CiteSeerX

guaranteed to produce stable or unstable behavior. Moreover ... optimization algorithms can be drastically affected by characteristics of the problem at hand. ... teristics of the search space for a given problem instance, such as the number of local

markov decision processes lodewijk kallenberg

Chapter 1 introduces the Markov decision process model as a sequential .... mial algorithms exist, e.g. of order O(N3), where N is the number of states. ...... variables that have nonnegative integer values and that the numbers pj(t) := P{Dt = j} are

INTRODUCTION TO MARKOV DECISION PROCESSES

Apr 29, 2010 ... Introduction to Markov Decision Processes. Motivation: Reinforcement Learning. • Reinforcement learning (RL) is a computational approach to ...

Variance-penalized Markov decision processes

Oct 23, 2013 - Programming and Reinforcement Learning Techniques. Abhijit A. Gosavi ... Keywords: Variance-penalized MDPs; dynamic programming; risk penalties; reinforcement ...... Chose a value for C in the interval (0,1). Chose any ...

Learning Qualitative Markov Decision Processes

this paper, a system is described that can automatically produce a state .... the set of states is described via a set of random variables X = {X1, .., Xn}, where each.

Factored Partially Observable Markov Decision Processes for ...

how a MDP may be applied to dialogue management, and. Singh et al. [2002] show ... dialogue trouble, such as different sources of speech rec- ognition errors. ..... which the user is trying to buy a ticket to travel from one city to another city.

Compositional reasoning for Markov decision processes - CiteSeerX

Nov 11, 2011 - Matthew Hennessy2â. 1Shanghai Jiao Tong ...... [DvGHM09] Yuxin Deng, Rob van Glabbeek, Matthew Hennessy, and Carroll Morgan. Testing.

Permissive Supervisor Synthesis for Markov Decision Processes ...

Mar 21, 2017 - LO] 21 Mar 2017. 1 .... tool COMICS [24] and L* learning library libalf [25]. ..... COMICS to find the counterexample path which is then fed.

Markov Decision Processes for Optimizing Human Workflows

The Business Process Management Notation (BPMN) has become a standard to ... taking aware of process modeling in order to get certain level of Maturity in BPM ..... We now introduce software solutions for solving MDP's applied to business ...

Partially Observable Markov Decision Processes ... - Semantic Scholar

values by restricting the planner to consider only the likelihood of the best ... Keywords: Spoken dialog systems, dialog management, partially observ- ...... In the TRAVEL application, a user is trying to buy a ticket to travel from one city to ...

Bayesian learning of noisy Markov decision processes

Nov 26, 2012 - ML] 26 Nov 2012. BAYESIAN LEARNING OF NOISY MARKOV DECISION. PROCESSES. SUMEETPAL S. SINGH, NICOLAS CHOPIN, AND ...

Hard Constrained Semi-Markov Decision Processes

we study the hard constrained (HC) problem in continuous time, state and action ... MDP with constraints have come a long way since the late 80's when Beulter et al ..... a discount rate of 0.5, there is always a possibility of get- ting a total ...

An Introduction to Markov Decision Processes

MDP Tutorial - 1. An Introduction to. Markov Decision Processes. Bob Givan. Ron Parr. Purdue University. Duke University ...

Reinforcement Learning and Markov Decision Processes

Behavior by Martijn van Otterlo (2008), later published at IOS Press (2009). .... ever, this puts a heavy burden on the designer or programmer of the system. All sit ...

Reinforcement Learning and Markov Decision Processes

and concepts behind Markov decision processes and two classes of algorithms for computing optimal behaviors: reinforcement learning and dynamic programming. ... Behavior by Martijn van Otterlo (2008), later published at IOS Press (2009).

Bounded Parameter Markov Decision Processes with ... - CiteSeerX

average reward problems, prove the existence of Blackwell optimal poli- cies and .... set, the maximum (or minimum) of qT V (a linear function of q) appearing in.

"Markov Decision Processes", by Lodewijk Kallenberg

reward at decision time point t for an action a in state i will be denoted by rt i(a); if the reward is independent of t

Average-Reward Decentralized Markov Decision Processes - IJCAI

and decentralized partially observable Markov decision pro- ..... IIS-. 0328601 and IIS-0535061. References. [Arapostathis et al.,1993] A. Arapostathis, V. S. ...

Transition-Independent Decentralized Markov Decision Processes

Transition-Independent Decentralized Markov Decision. Processes. Raphen Becker, Shlomo Zilberstein, Victor Lesser, Claudia V. Goldman. Department of ...

semi-markov decision processes - Semantic Scholar

where Rm denotes the reward earned between the (m 2 1)st and the (m)th epochs and Tm denotes the ..... PROPOSITION 2: For all u [ U0, lim inf t!1. 1 t Ñt. 0 h Rs,. 1 t Ñt. 0. Rq dq ds. Â¼. X x,a h Â¯r (x,a),. X y,b ...... San Francisco: Holden-Day.

Fixed Points for Markov Decision Processes - TUM

Download PDF

2 downloads 158 Views 205KB Size Report

Comment

fixed points through expectation on Markov chains and maximal and minimal ..... processes. In SAS 2003, volume 58 of Sci

Fixed Points for Markov Decision Processes Johannes H¨olzl ∗ TU M¨unchen [email protected]

Abstract The goal of this paper is to advertise the application of fixed points and ω-complete partial orders (ω-cpos) in the formalization and analysis of probabilistic programming languages. The presented work is formalized in the Isabelle theorem prover. By applying ω-cpos to the analysis of MDPs we get a nice theory of fixed points. This allows us to transfer least and greatest fixed points through expectation on Markov chains and maximal and minimal expectation on MDPs. One application is to define the operational semantics of pGCL by MDPs, e.g. relating the denotational and operational semantics of pGCL is now a matter of fixed point equations and induction.

1.

Introduction

It is standard to use fixed points to describe solutions for probabilistic model checking [2], or to give semantics for probabilistic programming languages as fixed points [5]. This is similar to the work by Monniaux [7]. Both are fixed points on complete lattices + (either [0, 1] or R , the non-negative reals with ∞). We are concerned with a more general class, namely ω-complete partial orders (ω-cpos). Here fixed points are only well-defined for continuous functions. A function F is sup-continuous (and inf-continuous) if for all increasing (and decreasing) sequences X, the function F commutes with supremum (and infimum) of X: d d F F F ( i Xi ) = i F (Xi ) and F ( i Xi ) = i F (Xi ). (1) The least (and greatest) fixed point of F is equal to the supremum (and infimum) of the iterations of F , when F is continuous: d F (2) gfp F = i F i (>) lfp F = i F i (⊥) Hence for continuous functions F, G, α on ω-cpos we derive a transfer rule similar to the fusion rule in [4]: α(⊥) = ⊥ α◦G=F ◦α α(lfp G) = lfp F

(3)

We get a dual rule for greatest fixed points. We need ω-cpos as they are compatible with measure theory: the + Borel-measurable functions M A, B R are not a complete lattice, but a ω-cpo, for each measurable spaces A. We know that least (and greatest) fixed points lfp F (and gfp F ) are measurable assuming the functional F is continuous and measurable: + + F ∈ M A, B R → M A, B R (4) In the rest of this paper we always assume that functions are measurable. From Lebesgue’s monotone convergence theorem follows that Lebesgue integration is a continuous function. There is a similar monotone convergence theorem for decreasing sequences. However, this rules requires that the integral is finite starting at some ∗ Supported

by the DFG Projekt NI 491/15-1

point i in the sequence Xi . We derive the following transfer rule on the Lebesgue integral: F, R G sup-continuous R ∀s, f ≤ lfp F. F f dMs = G (λs. f dMs ) Z lfp F dMs = lfp G s

(5)

Compared to Rule (3) we provide it with a stronger equation (restricted to f ≤ lfp F ), as this is necessary for certain proofs. Also adding the index s to the family of probability measures Ms will be very handy when working on the trace spaces of Markov chains and MDPs. We also assume measurability of F as the one in Rule (4). The rule for the greatest fixed point is very similar, however it requires that the integral is always finite. R F, G inf-continuous ∀f, s. F f dM R R s