A Compromise Programming Approach to Multiobjective Markov ... - Lip6

4 downloads 0 Views 606KB Size Report
tk ,dik. {ktk + n. ∑ i=1 dik : ηi ≤ tk + dik , dik ≥ 0,∀i}. • minOWA(η) =.. min n. ∑ k=1 wk (ktk + n. ∑ i=1 dik ). s.t.. { ηi ≤ tk + dik. ∀i,k dik ≥ 0. ∀i,k.
A Compromise Programming Approach to Multiobjective Markov Decision Processes Wlodzimierz Ogryczak1 , Patrice Perny and Paul Weng LIP6 - UPMC, Paris, France June, 13-17 2011

21st International Conference on MCDM Jyvaskyla, Finland

1

on leave from Warsaw University of Technology, Poland

Presentation of the problem Solution Method Experimental Results

MMDPs Compromise Search

Sequential Decision Making under Uncertainty a

r (s, a)

p(s, a, s0 )

.. .

s’

.. .

.. . ...

s

.. .

Ogryczak, Perny & Weng – LIP6, UPMC

.. .

.. .

Compromise Programming in MMDPs

Presentation of the problem Solution Method Experimental Results

MMDPs Compromise Search

Markov Decision Processes (MDPs) Definition • S set of states • A set of actions • p : S × A × S → [0, 1] • r : S × A → IR Solution • Pure/randomized decision rule δ • (Stationary) pure policy π

Ogryczak, Perny & Weng – LIP6, UPMC

Compromise Programming in MMDPs

Presentation of the problem Solution Method Experimental Results

MMDPs Compromise Search

Value Functions and Solution Methods Value functions • vtπ (s) = r (s, δt (s)) + γ • π % π 0 ⇔ ∀s, v π (s) ≥

X

π p(s, δt (s), s0 )vt−1 (s0 )

s0 ∈S 0 v π (s)

• v ∗ (s) = max r (s, a) + γ a∈A

X

p(s, a, s0 )v ∗ (s0 )

s0 ∈S

Family of solution methods • Value/Policy iterations • LP

Ogryczak, Perny & Weng – LIP6, UPMC

Compromise Programming in MMDPs

Presentation of the problem Solution Method Experimental Results

MMDPs Compromise Search

Multiobjective MDPs (MMDPs) a

(R1 (s, a), . . . , Rn (s, a))

.. .

p(s, a, s0 ) s’

.. .

.. . ...

s

.. .

Ogryczak, Perny & Weng – LIP6, UPMC

.. .

.. .

Compromise Programming in MMDPs

Presentation of the problem Solution Method Experimental Results

MMDPs Compromise Search

Multiobjective MDPs (MMDPs)

Definition • R : S × A → IRn

(n criteria)

• V π (s) ∈ IRn Value functions • Vtπ (s) = R(s, δt (s)) + γ • π%

π0



∀s, V π (s)

≥P

X

π p(s, δt (s), s0 )Vt−1 (s0 )

s0 ∈S 0 V π (s)

Ogryczak, Perny & Weng – LIP6, UPMC

Compromise Programming in MMDPs

Presentation of the problem Solution Method Experimental Results

MMDPs Compromise Search

Scalarizing Function for Compromise Search Example a

a b

b

c

c

d

Pure policies

d

Randomized policies

Scalarizing Function ψ • ψ : IRn → IR monotonic w.r.t. Pareto dominance • v (s) = ψ(V1 (s), . . . , Vn (s)) • Weighted sum does not provide any control on tradeoffs Ogryczak, Perny & Weng – LIP6, UPMC

Compromise Programming in MMDPs

Presentation of the problem Solution Method Experimental Results

MMDPs Compromise Search

Reference Point Method (RPM) Generic Scalarizing Achievement Function (Wierzbicki, 82) ε X ψε (y) = (1 − ε) max σi (yi ) + σi (yi ) n i=1...n i=1...n  1 r a a a σi (yi ) = r r −r a max βyi + (1 − β)ri − ri , yi − ri , α(yi − ri ) i

i

ηi = σi (yi ) β

1 yi 0

rir

Ogryczak, Perny & Weng – LIP6, UPMC

ria

α

Compromise Programming in MMDPs

Presentation of the problem Solution Method Experimental Results

MMDPs Compromise Search

RPM with an OWA OWA

OWA(η) =

n X

ωi ηhii

where ηi = σi (yi ) ∀ i = 1 . . . n

i=1

where ω1 > ω2 > . . . > ωn > 0 and ηh1i ≥ ηh2i ≥ . . . ≥ ηhni Example r r = (0, 0, 0)

r a = (10, 10, 10)

w = (5/10, 3/10, 2/10)

y η ηh1i ηh2i ηh3i OWA (4, 5, 9) (6, 5, 1) 6 5 1 4.7 (4, 8, 6) (6, 2, 4) 6 4 2 4.6 (4, 7, 7) (6, 3, 3) 6 3 3 4.5 Ogryczak, Perny & Weng – LIP6, UPMC

ψ0 ψε 6 6 + 4ε 6 6 + 4ε 6 6 + 4ε

Compromise Programming in MMDPs

Presentation of the problem Solution Method Experimental Results

MMDPs Compromise Search

Main properties of OWA • Symmetry: OWA(η1 , . . . , ηn ) = OWA(ητ (1) , . . . , ητ (n) ) • Pareto-Monotonicity: η P η 0 ⇒ OWA(η) > OWA(η 0 ) • Fairness (Monotonicity w.r.t Pigou-Dalton transfers): ∀i, j ∈ {1, . . . , n} s.t ηi > ηj , ∀ε ∈ (0, ηi − ηj ), OWA(η1 , . . . , ηi − ε, . . . , ηj + ε, . . . , ηn ) < OWA(η1 , . . . , ηn )

Ogryczak, Perny & Weng – LIP6, UPMC

Compromise Programming in MMDPs

Presentation of the problem Solution Method Experimental Results

MMDPs Compromise Search

RPM with a Weighted OWA OWA symmetric on regrets ⇒ WOWA I

I

A

A

Different Weights

Compromise Solution

RPM WOWA WOWA(η) =

n X

wi (λ, η)ηhii

i=1

P

P where wi (λ, η) = ϕ( k ≤i λτ (k) ) − ϕ( k ω2 > . . . > ωn > 0 and any positive importance weights λi , if y¯ is a properly nondominated with tradeoffs ¯ 1 /(1 − nλω ¯ 1 ) where λ ¯ = mini∈I λi , i.e. if bounded by ∆ = nλβω for any attainable outcome vector y the implication yi > y¯i and yk < y¯k



¯ 1 nλβω yi − y¯i ≤∆= ¯ 1) y¯k − yk (1 − nλω

(1)

is valid for any i, k ∈ I, then there exist aspirations levels ria and reservation levels rir such that y¯ is an optimal solution of the corresponding problem.

Ogryczak, Perny & Weng – LIP6, UPMC

Compromise Programming in MMDPs

Presentation of the problem Solution Method Experimental Results

Navigation in a Grid Computation times Conclusion

Example: Path-planning problems

Ogryczak, Perny & Weng – LIP6, UPMC

Compromise Programming in MMDPs

Presentation of the problem Solution Method Experimental Results

Navigation in a Grid Computation times Conclusion

Do we really need all the Pareto optimal solutions? Example adapted from (Hansen, 80) (0, 1)

(0, 2)

0

(2, 0)

(1, 0)

V1π (0)

+

...

1

V2π (0)

=

N X

(0, 2N )

(0, 2N−1 ) N-1

(2N−1 , 0)

N

(2N , 0)

2i = 2N+1 − 1

i=0

• The number of Pareto optimal pure policies grows exponentially with the number of states Ogryczak, Perny & Weng – LIP6, UPMC

Compromise Programming in MMDPs