Correction to Hidden Markov Model Multi-arm ... - Semantic Scholar

Correction to Hidden Markov Model Multi-arm Bandits: A Methodology for Bearm Scheduling in Multi-Target Tracking

∗

Vikram Krishnamurthy Department of Electrical Engineering University of Melbourne, Victoria 3010, Australia email [email protected] August 19, 2002

We have discovered an error in the return-to-state formulation of the HMM multi-armed bandit problem in our recently published paper [4]. This note briefly outlines the error in [4] and describes a computationally simpler solution. Complete details including proofs of this simpler solution appear in the already submitted paper [3]. The error in [4] is in the return-to-state argument given in Result 1, equation (12). It should read:

V (p) (x(p) , x¯(p) ) = min R0 (p)x(p) + β

R (p)¯ x

0

B (p) (m)A(p) x(p)

V (p)

0 10 B (p) (m)A(p) x(p)

m=1 Mp

0

Mp X

(p)

+β

X

m=1

V

(p)

0

B (p) (m)A(p) x¯(p) 0

10 B (p) (m)A(p) x ¯(p)

(p)

, x¯

,x ¯(p)

!

0

!

1B

0

10 B (p) (m)A(p) x(p) ,

(p)

(m)A

(p) 0 (p)

x¯

(1)

Unfortunately, there appears no obvious way of obtaining a finite dimensional characterization of the above value function in the return-to-state argument. In this note, it is shown that by introducing the retirement formulation [2] of the multi-armed bandit problem option, a finite dimensional value iteration algorithm can be obtained for computing ∗ This

work was supported by an ARC large grant and the Centre of Expertise in Networked Decision Systems

1

the Gittins index of a POMDP bandit. To avoid repetition, we use exactly the same notation as [4]. For each project p, let M (p) denote a positive real number such that

4 ¯ (p) = M max R(p, i).

¯ (p) 0 ≤ M (p) ≤ M

(2)

i∈Np

¯ (p) . To simplify subsequent notation, we omit the superscript p in M (p) and M In terms of a parameterized retirement reward M , the Gittins index [1], [2] of project p with information state x(p ) can be defined as

4

γ (p) (x(p) ) = min{M : V (p) (x, M ) = M }

(3)

where V (p) (x, M ) satisfies the functional Bellman’s recursion

V

(p)

0

(x, M ) = max R (p)x

(p)

+β

Mp X

0

V

(p)

m=1

B (p) (m)A(p) x(p) 0

10Np B (p) (m)A(p) x(p)

,M

!

0

10Np B (p) (m)A(p) x(p) , M

(4)

The N th order approximation of V (p) (x(p) , M ) is obtained as the following value iteration algorithm k = 1, . . . , N :

(p) Vk+1 (x(p) , M )

Mp X (p) = min R0 (p)x(p) + β Vk m=1

0

B (p) (m)A(p) x(p) 0

10Np B (p) (m)A(p) x(p)

,M

!

0

10Np B (p) (m)A(p) x(p) , M

(5)

Note that Corollary 1 of [4] holds for Gittins index. Similar to Sec.2.4 in [4] we need to compute a finite dimensional characterization of the value iteration algorithm (5). This is done as follows: Define the (Np + 1) dimensional augmented information state x ¯ ∈ {[x0 , 0]0 , [00Np , 1]0 where x ∈ X (p) . Define an augmented observation process y¯k ∈ {1, . . . , Mp + 1} and the corresponding (Np + 1) × (Np + 1)

2

transition and observation probability matrices as 

(p)

A (p) A1 =   00Np 

(p)

0Np ×Np (p) A2 =   00Np





B (p) B1 (m) =  

0Np  ,  1 

1Np  ,  1

(p)



(m) 0Np  ,  0 0Np 1

(p)

B2 (m) = I(Np +1)×(Np +1)

(p)

(p)

B1 (m) = diag(column m of B1 );

(6)

(p)

B2 (m) = diag(column m of B2 ), 

4  z= 



¯ M/M ¯ 1 − M/M,

 , 

m ∈ {1, . . . , Mp + 1}

¯ 0≤M ≤M

(7)

Define the information state π and following coordinate transformation:

π =z⊗x ¯ 

A 1 A¯1 = I2×2 ⊗ A1 =   0



0 ,  A1



A 2 A¯2 = I2×2 ⊗ A2 =   0



¯ (p) (m) = I2×2 ⊗ B (p) (m), ¯ (p) (m) = I2×2 ⊗ B (p) (m) B B 1 1 2 2 ¯ 1 (p) = R0 (p) 0 R0 (p) 0 , ¯ 2 (p) = M1 ¯ 0 R R 0 00Np Np

0 ,  A2

0

(8)

Then using similar arguments to [4] it can be shown that Theorem 1 in [4] holds. (p)

Moreover, Theorem 2 in [4] holds with the following minor changes: Vk (x(p) , M ) replaces (p)

Vk )(x(p) , x ¯(p) ) in statement 1. Statement 2 is unchanged. Statement 3 is modified to the following explicit formula for the Gittins index:

(p)

γN (x(p) ) =

max

λi,N ∈ΛN

¯ 0 (3) x(p) Mλ i,N ¯ (λi,N (3) − λi,n (1))0 x(p) + M

(9)

(p)

where each vector λi,N ∈ ΛN is of the form 0 λi,k = λ0 (1) 0 λ0 (3) 0 , i,k i,k

3

where λi,k (1), λi,k (3) ∈ RNp

(10)

Statement 3 above gives an explicit formula for the Gittins index of the HMM multi-armed bandit (p)

problem. Recall xk is the information state computed by the pth HMM filter at time k. Given that (p)

(p)

(p)

we can compute set of vectors ΛN , (9) gives an explicit expression for the Gittins index γN (xk )at ¯ for all x. any time k for project p. Note if all elements of R(p) are identical, then γ (p) (x) = M Sections 2.5, 3 and 4 of the paper [4], including the beam scheduling algorithm for a hybrid sensor of Sec.3.2, still hold. It is worthwhile noting that the above solution is computationally simpler as the information state π is a 2(Np + 1) dimensional vector, whereas the information state π considered in [4] is a Np2 dimensional vector.

References [1] D.P. Bertsekas. Dynamic Programming and Optimal Control, volume 1 and 2. Athena Scientific, Belmont, Massachusetts, 1995. [2] J.C.Gittins. Multi–armed Bandit Allocation Indices. Wiley, 1989. [3] V. Krishnamurthy. A value iteration algorithm for partially observed markov decision process multi-armed bandits. submitted to IEEE Transactions Automatic Control, 2001. [4] V. Krishnamurthy and R.J. Evans. Hidden Markov model multi-arm bandits: A methodology for beam scheduling in multi-target tracking. IEEE Trans. Signal Proc., 49(12):2893–2908, December 2001.

4

Correction to Hidden Markov Model Multi-arm ... - Semantic Scholar

Correction to Hidden Markov Model Multi-arm ... - Semantic Scholar

Suggest Documents

A Profile Hidden Markov Model to investigate the ... - Semantic Scholar

Hidden Markov Models - Semantic Scholar

Hidden Markov Models - Semantic Scholar

Model-based clustering with Hidden Markov Model ... - Semantic Scholar

Hidden Markov model (HMM)

Baum-Welch Re-estimation of Hidden Markov Model - Semantic Scholar

A HIDDEN MARKOV MODEL FOR DETECTION ... - Semantic Scholar

A Hidden Markov Model for Collaborative Filtering - Semantic Scholar

hidden markov model based animal acoustic ... - Semantic Scholar

mllr adaptation for hidden semi-markov model ... - Semantic Scholar

Bayesian Learning for Hidden Markov Model with ... - Semantic Scholar

Hidden Markov Model Analysis of Maternal ... - Semantic Scholar

The Infinite Hidden Markov Random Field Model - Semantic Scholar

hidden markov model classification of myoelectric ... - Semantic Scholar

A Hidden Markov Model web application for ... - Semantic Scholar

Pair Hidden Markov Model for Named Entity ... - Semantic Scholar

Lumpable hidden markov models-model reduction ... - Semantic Scholar

Hidden Markov Models Implementation for ... - Semantic Scholar

Multilayer Perceptrons versus Hidden Markov ... - Semantic Scholar

Hidden Markov Models in Bioinformatics - Semantic Scholar

Bayesian Clustering Using Hidden Markov ... - Semantic Scholar

COPULAS IN VECTORIAL HIDDEN MARKOV ... - Semantic Scholar

Instrument classification using Hidden Markov ... - Semantic Scholar

Using Hidden Markov Models to Evaluate the ... - Semantic Scholar