Min Max Generalization for Deterministic Batch Mode ... - Google Sites

0 downloads 205 Views 732KB Size Report
Sep 29, 2011 - University of Liège. Mini-workshop on Reinforcement Learning. Department of Electrical Engineering and C
Min Max Generalization for Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

Raphael Fonteneau, Damien Ernst, Bernard Boigelot, Quentin Louveaux University of Liège

Mini-workshop on Reinforcement Learning Department of Electrical Engineering and Computer Science University of Liège September 29th, 2011

Formalization

The batch mode setting

Lipschitz continuity

The worst that can happen Liège

? UARS Satellite

Given: ●

The batch collection of trajectories



The Lipsthiz continuity assumptions + two constants

The T-stage problem

Any suggestions?

So let us start with the 2-stage case...

The 2-stage problem

First results

Relaxation scheme: trust region

Relaxation scheme: trust region

Relaxation scheme: Lagrangian dual

Relaxation scheme: Lagrangian dual

Relaxation schemes: synthesis

Illustration





Uniformly drawn state-action couples

Illustration

Grid

Average (uniform sampling)

Tons of future works

T-stage problem

Stochastic frameworks

? Exact solution ?

Infinite horizon