Solving Factored MDPs via Non-Homogeneous Partitioning - CiteSeerX

Recommend Documents

SOLVING NONHOMOGENEOUS RECURRENCE RELATIONS OF

If the nonhomogeneous part equals a polynomial or a factorial polynomial, our general solution allows us to recover a well-known particular solutionâAsveld's.

Hierarchical Reinforcement Learning in Factored MDPs - Institut des

representations, HEXQ [4] and VISA [5] are two algorithms designed to solve this problem. Besides, for FMDPs, [6] have proposed SDYNA to solve Factored ...

Between MDPs and Semi-MDPs: Learning, Planning, and ... - CiteSeerX

Learning, Planning, and Representing Knowledge at Multiple Temporal Scales. Richard S. Sutton [email protected]. Doina Precup [email protected].

ON A NONHOMOGENEOUS QUASILINEAR ... - CiteSeerX

May 9, 2007 - and p, q are continuous functions on Î© such that 1 < infÎ© q < infÎ© p < supÎ© q, .... For any u â Lp(x)(Î©) and v â Lp (x)(Î©) the HÃ¶lder type inequality ..... p(x)-Laplacian, in Contributions to Nonlinear Analysis, A Tribute to

Multiway partitioning via geometric embeddings

Nov 11, 1995 - Charles J. Alpert, Student Member, IEEE, and Andrew B. Kahng, Associate ... demonstrate that geometric embeddings of the circuit netlist can ... application of current toolsets to smaller pieces of the ... Definition: A k-way partition

Solving the mesh-partitioning problem with an ant-colony ... - CiteSeerX

May 14, 2004 - a Computer Systems Department, Jozef Stefan Institute, Jamova 39, ... so-called multilevel ant-colony algorithm, which is a relatively new metaheuristic search tech- ... [12], job-shop scheduling [32], vehicle [23] and network routing

Heuristic algorithms for solving the set-partitioning problem - CiteSeerX

set-partitioning problem (SPP) which is NP-complete. Let N = f1, 2, . . ., ng be a set of customers, and let P = fP1, P2, ..., Psg, s = ?n. 1 + ?n. 2 + ?n. 3 , be a set of ...

Solving MDPs with Skew Symmetric Bilinear Utility Functions - IA - Lip6

on a famous TV game, Who Wants to Be a Millionaire?. Our results illustrate the enhanced possibilities offered by the use of an SSB utility function for controlling ...

Nonhomogeneous boundary value problems in ... - CiteSeerX

un nombre rÃ©el positif et pi et q sont des fonctions continues telles que 2 â©½ pi(x) < N et q(x) > 1 pour tout x â Î© et ... On considÃ¨re le problÃ¨me non linÃ©aire. â§.

Partitioning procedures for solving mixed-variables programming ...

Partitioning procedures for solving mixed-variables programming problems". By. J. F. BENDERS**. I. Introduction in this paper two slightly different procedures ...

Factored solution of nonlinear equation systems - CiteSeerX

dure to solve nonlinear systems of equations, radically departing from the conventional Newton-Raphson scheme. The original nonlinear system is first unfolded ...

Multi-Way Partitioning Via Spacefilling Curves and ... - CiteSeerX

cad Corporation under the State of California MICRO program. ABK was also supported ... V = fv1; v2;:::;vng, and a value 2 k n, construct a k-way partitioning, ...

a continuous spectrum for nonhomogeneous differential ... - CiteSeerX

MIHAI MIH ËAILESCU and VICEN TIU R ËADULESCU. Abstract. We study the nonlinear eigenvalue problem â div(a(|âu|)âu) = Î»|u|q(x)â2u in , u = 0 on â,.

Bayesian Nonparametric Inference for Nonhomogeneous ... - CiteSeerX

Sujit K. Ghosh is Assistant Professor, Department of Statistics, North Carolina State University, ... We take a nonparametric Bayesian approach in this paper.

Lexical Acquisition via Constraint Solving

... on Applied Natural Language Processing, pages 136â143, Austin, TX, 1988. ... 172â178, Cambridge, MA, August 1977. ... [10] G. Satta and O. Stock.

Lexical Acquisition via Constraint Solving

the Second Conference on Applied Natural Language Processing, pages 136â143, Austin, TX, 1988. [4] R. Granger. ... [10] G. Satta and O. Stock. Bidirectional ...

Geometric Constraint Solving via Decomposition

Oct 18, 2003 - Geometric constraint solving, parametric CAD, .... types of geometric primitives: points, planes and lines in ... valence of e, which is the number of scalar equations re- .... constraint involving two primitives. All other cases can b

Relational Reasoning via SMT Solving

constraints â as performed by the Alloy Analyzer â is based on SAT ..... F [all x : e|f]=(forall (v T1[e]) (=> E[e, v] F [f][v/x])). F [some ... F [e1 in e2]=(forall (v1 T1[e1]) .

Factored Value Iteration Converges

For MDPs with known parameters, there are three basic solution methods (and, naturally ... AI] 13 Aug 2008 .... (See also Appendix A.1 for illustration.) ..... but only polynomial in the number of variables, making the approach attractive.

Factored Neural Machine Translation

Sep 15, 2016 - We present a new approach for neural machine translation (NMT) using the morphological and ..... by jointly learning to align and translate.

A Bayesian Approach Using Nonhomogeneous Poisson ... - CiteSeerX

Metropolis algorithms along with Gibbs steps are proposed to perform the Bayesian inference of such models. Some Bayesian model diagnostics are developed ...

Partitioning Multivariate Polynomial Equations via Vertex Separators ...

a minimum-weight vertex separator on its variable-sharing graph, which is an. NP-complete problem [39, 55]. Nevertheless, heuristic algorithms can often find.

Secure Web Applications via Automatic Partitioning - Computer ...

Swift is a new, principled approach to building web applications that are secure by ... Recent trends in web application design have exacerbated the security ...

Throughput optimization via cache partitioning for embedded ...

domain low power and low cost demands make the use of general purpose architectures with clock frequencies in the order of several GHz inappropriate.

Solving Factored MDPs via Non-Homogeneous Partitioning - CiteSeerX

Download PDF

11 downloads 0 Views 160KB Size Report

Comment

algorithms applied to FMDPs [Gordon, 1995; Tsitsiklis and. Van Roy, 1996; Koller and Parr, ... (see [Goldsmith and Sloan, 2000] for recent analysis.) though.

Solving Factored MDPs via Non-Homogeneous Partitioning Kee-Eung Kim and Thomas Dean Department of Computer Science Brown University Providence, RI 02912-1910 {kek,tld}@cs.brown.edu Abstract This paper describes an algorithm for solving large state-space MDPs (represented as factored MDPs) using search by successive refinement in the space of non-homogeneous partitions. Homogeneity is defined in terms of bisimulation and reward equivalence within blocks of a partition. Since homogeneous partitions that define equivalent reduced state-space MDPs can have a large number of blocks, we relax the requirement of homogeneity. The algorithm constructs approximate aggregate MDPs from non-homogeneous partitions, solves the aggregate MDPs exactly, and then uses the resulting value functions as part of a heuristic in refining the current best non-homogeneous partition. We outline the theory motivating the use of this heuristic and present empirical results and comparisons.

1 Introduction Markov Decision Processes (MDPs) employing representations that factor states, actions and their associated transition and reward functions in terms of component functions of state variables (fluents) have surfaced as plausible models for planning under uncertainty [Boutilier et al., 1999]. Since the number of states in these factored MDPs (FMDPs) is exponential in the number of fluents, traditional iterative methods that require enumerating states are typically not effective. In this paper, we solve FMDPs by constructing and refining non-homogeneous partitions of the state space. In a homogeneous partition, any two states in a block have the same reward and the same distribution with respect to transitions to other blocks. Non-homogeneous partitions allow variation among the states in a block with regard to rewards and block transition distributions. From a non-homogeneous partition, we construct an aggregate MDP by averaging the transition probabilities and the rewards within blocks. The algorithm described in this paper solves FMDPs by successive refinement in the space of non-homogeneous partitions. We use factored representations to encode the partitions and the aggregate MDPs we construct from them. We can solve the aggregate MDPs in time polynomial in the number of blocks in the partition. The resulting optimal policies

(optimal for the aggregate MDPs) also serve as policies for the original MDP and the value function for the optimal policy in the aggregate MDP serves as an estimate for the value of the policy in the original MDP. Given this estimate and the current partition, we choose the refinement that yields the greatest improvement and iterate. In the remainder of this paper, we present some background, provide a theoretical motivation for our refinement heuristic, and describe the results of a series of experiments.

2 Factored MDPs (FMDPs) Definition 1 (FMDP) An FMDP is defined as a tuple M = ~ A, T, R} where {X, ~ = [X1 , . . . , Xn ] is a vector of fluents which collec• X tively define the state. We use ΩXi to denote the set of values for Xi . ΩXi is the sample space of Xi when Xi is considered as a random variable. Thus, the state space Q Ω We use lower case letters for M is ΩX~ = X i. i ~x = [x1 , . . . , xn ] to denote a particular instantiation of the fluents, and Xi,t and xi,t denote, respectively, a fluent and its value at a particular time t. • A is the set of actions. • T is the set of transition probabilities, represented as the set of conditional probability distributions, one for each action and fluent: ~ t+1 ) = ~ t , a, X T (X

n Y

P (Xi,t+1 |pa(Xi,t+1 ), a)

i=1

where pa(Xi,t+1 ) denotes the set of parents of Xi,t+1 in the graphical model (see below). Note that ∀i, pa(Xi,t+1 ) ⊆ {X1,t , . . . , Xn,t }. ~ → < is the reward function. Without loss of gen• R:X erality, we define the reward to be determined by both ~ × A →