geometric interpretation of hamiltonian cycles ... - Semantic Scholar

Optimization Vol. 52, Nos. 4–5, August–October 2003, pp. 441–458

GEOMETRIC INTERPRETATION OF HAMILTONIAN CYCLES PROBLEM VIA SINGULARLY PERTURBED MARKOV DECISION PROCESSES VLADIMIR EJOV, JERZY A. FILAR* and JANE THREDGOLD Centre for Industrial and Applicable Mathematics, University of South Australia, Mawson Lakes Boulevard, Mawson Lakes, South Australia, 5095 (Received 09 January 2003; In final form 23 July 2003) We consider the Hamiltonian cycle problem (HCP) embedded in a singularly perturbed Markov decision process (MDP). More specifically, we consider the HCP as an optimization problem over the space of long-run state-action frequencies induced by the MDP’s stationary policies. We also consider two quadratic functionals over the same space. We show that when the perturbation parameter, ", is sufficiently small the Hamiltonian cycles of the given directed graph are precisely the maximizers of one of these quadratic functionals over the frequency space intersected with an appropriate (single) contour of the second quadratic functional. In particular, all these maximizers have a known Euclidean distance of zm(") from the origin. Geometrically, this means that Hamiltonian cycles, if any, are the points in the frequency polytope where the circle of radius zm(") intersects a certain ellipsoid. Keywords: Hamiltonian cycle; Markov decision process; Perturbation parameter; Directed graph Mathematics Subject Classifications 2000: 05C35; 05C38; 05C45; 90C35; 90C40

1

INTRODUCTION

Simply put, the Hamiltonian Cycle Problem (HCP) is: given a graph, find a simple cycle that contains all vertices of the graph (Hamiltonian cycle (HC)) or prove that HC does not exist. With respect to this property – Hamiltonicity – graphs possessing HC are called Hamiltonian.1 By a graph of order N we will mean a simple N-vertex graph (without self loops and multiple edges) that can be both symmetric (every edge admits the two-way traffic) or directed (digraph , with one-way traffic along each edge). This article is a continuation of a line of research [3–6] which aims to exploit the tools of Markov Decision Processes (MDPs) to study the properties of the HCP. More specifically, we consider the HCP as an optimization problem over the space of long-run *Corresponding author. E-mail: [email protected] 1 The name of the problem owes of the fact that Sir William Hamilton investigated the existence of such cycles on the dodecahedron graph [12]. ISSN 0233-1934 print: ISSN 1029-4945 online ß 2003 Taylor & Francis Ltd DOI: 10.1080/02331930310001611529

442

V. EJOV et al.

state-action frequencies induced by the MDP’s stationary policies. We also consider two quadratic functionals over the same space. We show that when the perturbation parameter, ", is sufficiently small the Hamiltonian cycles of the given directed graph are precisely the maximizers of one of these quadratic functionals over the frequency space intersected with an appropriate (single) contour of the second quadratic functional. In particular, all these maximizers have a known Euclidean distance of zm ð"Þ from the origin. Geometrically, this means that Hamiltonian cycles, if any, are the points in the frequency polytope where the circle of radius zm ð"Þ intersects a certain ellipsoid. Nonetheless, the corresponding optimization problem is that of maximizing a convex function over a convex set and, as such, it still belongs to the NP-complete class of problems. In the line of research initiated in [4], the goal is to express the HCP as a problem involving the minimization of functionals from the theory of MDPs. This is done with the hope that the resulting embedding in a continuous setting – endowed with probabilistic interpretation – will stimulate research into algorithmic procedures that exploit this novel relaxation of the HCP. Many of the most successful classical approaches of discrete optimization focus on solving a linear programming ‘‘relaxation’’ followed by heuristics that prevent the formation of subcycles (subtours). In our approach, we embed the underlying graph in a singularly perturbed MDP in such a way that we can identify Hamiltonian cycles with irreducible Markov chains and subcycles with nonexhaustive ergodic classes. Indirectly, this allows us to search for a Hamiltonian cycle in the frequency space of an MDP that is a ‘‘nice’’ polytope with a nonempty interior, thereby converting the original discrete problem to a continuous one. The above embedding has led to both theoretical insights and new computational approaches. Note that this approach is essentially different from that adopted in the study of random graphs where an underlying random mechanism is used to generate a graph (e.g., see Karp’s seminal paper [8]). In our approach, the graph that is to be studied is given and fixed but a controller can choose arcs according to a probability distribution and with a small probability (due to a perturbation) an arc may take you to a node other than its ‘‘head’’. Of course, random graphs have played an important role in the study of Hamiltonicity, a striking result to quote is that of Robinson and Wormald [11] who showed that with high probability k-regular graphs are Hamiltonian for k 3: Furthermore, our previous results open up many natural directions for investigation. The recently implemented (and still preliminary) heuristic interior-point algorithm (Ejov et al. [2]) is based on the cited stochastic embedding and appears to be performing competitively with alternative – general purpose – algorithms on various test problems including the ‘‘Knight’s tour’’ problem2 on chessboards of the size up to 32 32. This indicates a promise that a conceptually innovative and computationally competitive algorithm for the HCP may be developed via this approach. Of course, much research has been done on algorithms for finding HC on various restricted graph classes. For example, Parberry [10] developed a divide and conquer knight’s tour algorithm on n n chessboard for all even n 6 that retains the rotational symmetry spotted in

2 ‘‘Knight’s tour puzzle’’: start at any square of the 8 8 chessboard visit every other square once and only once and return to the starting square.

HAMILTONIAN CYCLES PROBLEM

443

the original Euler algorithm for n ¼ 6 and n ¼ 8. Clearly, algorithms designed for particular classes of graphs tend to outperform the best general purpose algorithms when applied to graphs from these classes. The article is organized as follows. In Section 2 we introduce an embedding of HCP in a perturbed MDP that uses a different, higher order, perturbation than the one in [4]. This has two main advantages: the resulting Markov chains corresponding to stationary strategies are all irreducible (rather than just unichain), and the correspondence between stationary policies and points in the long-run frequency space is now one-one and onto making the usual transformations invertible. These results are new even though their derivation is analogous to the one given in [4]. In Section 3 we introduce the new, quadratic, objective function and give a novel quadratic programming formulation of the HCP. While this quadratic progam is still nonconvex, it lends itself to an interesting geometric interpretation and possess a lot of special structure that may lead to new algorithmic advances. Indeed, in [2] an interior-point heuristic applied to an earlier nonconvex programming formulation has yielded encouraging numerical result. For instance, Hamiltonian cycles were found in some graphs containing as many as 1024 nodes. In Section 4 we prove that the mathematical programming formulation given in Section 3 indeed characterizes Hamiltonian cycles of a graph as global maxima of our quadratic program, provided that the value of our perturbation parameter is sufficiently small. In the Appendix we demonstrate that all three cases of our main theorem can be realized (even in instances of very small graphs), and we give a closed form expression for one of the important quantities in our formulation, namely, the square of the Euclidean distance of the frequency vector induced by any Hamiltonian cycle.3

2

IRREDUCIBLE UNICHAIN MDP AS A REPRESENTATION OF A DIRECTED GRAPH

Consider a directed graph G with the node set S and the arc set A. We can associate an MDP, , with the graph G as follows: . The set of N nodes is the finite state space S ¼ f1, 2, . . . , Ng and the set of arcs in G is the total action space A ¼ fði, jÞ, i, j 2 Sg where, for each state (node) i, S the action space A(i) is the set of arcs ði, jÞ emanating from this node and A ¼ i¼N i¼1 AðiÞ: Note that, we are adopting the convention that j equals to both arc ði, jÞ and its ‘‘head’’ j, whenever there is no possibility of confusion as to the ‘‘tail’’ i: . fpð jji, aÞ ¼ aj ja 2 AðiÞ, i 2 Sg, where aj is the Kronecker delta, is the set of (one-step) transition probabilities.

A stationary policy in is a set of N probability vectors ðiÞ ¼ ½ði, 1Þ, ði, 2Þ, . . . , ði, NÞ,

3 We are indebted to an anonymous referee for encouraging us to explicitly derive the results reported in this Appendix.

444

V. EJOV et al.

where ði, kÞ denotes the probability of choosing action ði, kÞ whenever state i is visited. Of course, if the arc ði, kÞ 2 = A, then ði, kÞ ¼ 0: It is clear that fði, kÞg satisfy the stochastic condition N X

ði, kÞ ¼ 1,

8 i ¼ 1, . . . , N:

ð1Þ

k¼1

A deterministic policy f is simply defined as a stationary policy that selects a single action with probability 1 in every state. That is, f ði, kÞ ¼ 1 for some ði, kÞ 2 A: For convenience, we will write f ðiÞ ¼ k in this case. Assume that 1 is the initial state (home node). We shall say that a deterministic policy f in is a Hamiltonian Cycle (HC) in G if the subgraph Gf with the set of arcs fð1, f ð1ÞÞ, ð2, f ð2ÞÞ, . . . , ðN, f ðnÞÞg is a HC in G. If the subgraph Gf contains cycles of length less than N, say m, we say that f has an m-subcycle. Next, we recall that any stationary policy induces a probability transition matrix ½ pðÞði, jÞ ¼ ½ pð jji, Þ,

i, j ¼ 1, . . . , N,

where for all i, j 2 S

pð jji, Þ ¼

N X

pð jji, aÞ ði, aÞ:

a¼1

By P ðÞ we denote the stationary distribution matrix, which is defined as its limit Cesaro-sum matrix T 1 X Pt ðÞ, T!1 T þ 1 t¼1

P ðÞ :¼ lim

P0 ðÞ ¼ IN ,

where IN is a N N identity matrix. However, such a straightforward embedding of G in MDP leads to an inevitable difficulty of confronting multiple ergodic classes and ‘‘transient’’ states as the following examples for a four-node graph demonstrate: . A policy f such that f ð1Þ ¼ 2, f ð2Þ ¼ 1, f ð3Þ ¼ 4, and f ð4Þ ¼ 3 induces a subgraph Gð f Þ ¼ fð1, 2Þ, ð2, 1Þ, ð3, 4Þ, ð4, 3Þg which contains two 2-subcycles. The policy f induces a Markov chain with the probability transition matrix

2

0 61 Pð f Þ ¼ 6 40 0

1 0 0 0

0 0 0 1

3 0 07 7 15 0


445

which has two ergodic classes corresponding to the subcycles of Gf. This multiple ergodic classes phenomenon is reflected in the Cesaro-sum matrix 21 1 3 0 0 2 2 61 1 7 7 16 6 2 2 0 07 P ðfÞ ¼ 6 7: 260 0 1 1 7 2 25 4 0 0 12 12 . A policy g such that gð1Þ ¼ 2, gð2Þ ¼ 1, gð3Þ ¼ 2, gð4Þ ¼ 3 induces a subgraph Gg ¼ fð1, 2Þ, ð2, 1Þ, ð3, 2Þ, ð4, 3Þg: Policy g induces a Markov chain 2 3 0 1 0 0 61 0 0 07 7 PðgÞ ¼ 6 40 1 0 05 0 0 1 0

with the Cesaro-sum matrix 21 2

61 6 62 P ðgÞ ¼ 6 61 42 1 2

1 2

0

1 2

0

1 2

0

1 2

0

0

3

7 07 7 7 07 5 0

that indicates that states 3 and 4 are transient. These and some other technical difficulties would vanish if we force the MDP to be unichain and free of transient states. This is achieved by passing to a singularly perturbed MDP, namely, " , that is obtained from by introducing perturbed transition probabilities fP" ð jji, aÞj ði, jÞ 2 A, i, j 2 Sg by the rule 8 1 ðN 2Þ"2 , > > > > > > "2 , > > > > > < 1, p" ðjji, aÞ :¼ > 1 ", > > > > > > > ", > > > : 0,

if i ¼ 1 and a ¼ j if i ¼ 1 and a 6¼ j > 1 if i > 1 and a ¼ j ¼ 1 if i > 1 and a ¼ j > 1 if i > 1 and a 6¼ j ¼ 1 in all other cases

pffiffiffiffiffiffiffiffiffiffiffiffiffi for " 2 0, 1= N 2 : Note that 1 denotes the ‘‘home’’ node. For each pair of nodes ði, jÞ (not equal to 1) corresponding to a (deterministic) arc ði, jÞ, our perturbation replaces that arc by a pair of ‘‘stochastic arcs’’ ði, 1Þ and ði, jÞ with weights " and ð1 "Þ, respectively. This part of the trick eliminates multiple ergodic classes, establishing a ‘‘stochastic link’’

446

V. EJOV et al.

between any node and the home node. On the other hand, for each pair of nodes ð1, jÞ corresponding to a deterministic arc ð1, jÞ, our perturbation replaces this arc by a collection of arcs consisting of the arc ð1, jÞ with the weight 1 ðN 2Þ"2 and arcs fð1, kÞj k 6¼ j, k 6¼ 1g with the weight "2 each. Since j 6¼ 1 (because G is self-loops free), there are exactly ðN 2Þ ‘‘virtual’’ arcs ð1, kÞ. This part of the trick eliminates possible transient states. As for the nonperturbed MDP, any stationary policy induces a probability transition matrix ½P" ðÞði, jÞ ¼ ½p" ð jji, Þ,

i, j ¼ 1, . . . , N,

where

p" ð jji, Þ :¼

N X

p" ð jji, aÞ ði, aÞ:

a¼1

The Cesaro-sum matrix P" ðÞ will in this ‘‘unichain’’ situation consist of identical rows P ¼ ð p1 , . . . , pN Þ, where P is the (unique) solution of the system 8 > < P ðIN P" ðÞÞ ¼ 0 N P > pj ¼ 1 :

ð2Þ

j¼1

The absence of transient states ensures that pj > 0, 8 j ¼ 1, . . . , N: This feature of P" ðÞ is crucial for one-one correspondence between the set of all stationary policies and the convex frequency polytope F" in the Euclidean space X with the coordinates fxia jði, aÞ 2 Ag due to the following construction (e.g., see [7]). Set xia ¼ MðÞi, a :¼ pi ðÞ ði, aÞ,

ð3Þ

where pi ðÞ ¼ P" ðÞði, aÞ and, also xi :¼

X a2AðiÞ

xia ¼

X

pi ðÞ ði, aÞ ¼ pi ðÞ,

a2AðiÞ

since X a2AðiÞ

ði, aÞ ¼ 1,

8 i ¼ 1, . . . , N:

ð4Þ


447

The first equation of the system (1) and (2) imply that for all j 2 S

0¼

N N N X X X X ðij P" ðÞ ði, jÞÞpi ¼ ij pi p" ð jji, aÞ ði, aÞpi i¼1

¼

i¼1

N X X i¼1

¼

i¼1 a2AðiÞ

a2AðiÞ

N X X

i¼1 a2AðiÞ

N X X ði, aÞ ij pi p" ð jji, aÞ ði, aÞpi

ðij p" ð jji, aÞÞxia :

i¼1 a2AðiÞ

Further, due to 1 N X X i¼1 a2AðiÞ

xia ¼

N X X

pi ði, aÞ ¼

i¼1 a2AðiÞ

N X

pi ¼ 1,

i¼1

the transformation M maps onto (and, as we shall see below, it is a one-one map) the polyhedral set F " defined by the linear constraints:

ðiÞ

N X X

ðij p" ð jji, aÞÞxia ¼ 0, j ¼ 1, . . . , N:

i¼1 a2AðiÞ

ðiiÞ

N X X

xia ¼ 1

i¼1 a2AðiÞ

ðiiiÞ xia 0, a 2 AðiÞ, i ¼ 1, . . . , N: In fact, xia ¼ 0 if and only if ði, aÞ ¼ 0: The polytope F " is called the (long-run) frequency polytope and the space X with coordinates xia is called the (long-run) frequency space . Let S0 :¼ fi 2 Sj xi ¼ 0, x 2 F " g, where xi is defined in (4). Below we shall see that S0 ¼ 6 0. Denote by m(i ) the number of arcs emanating from the node i, 1 i N: Define x 2 by

x ði, aÞ :¼

8 x ia > , > > < xi

i2 = S0

> 1 > > : , mði Þ

i 2 S0

448

V. EJOV et al.

and consider ðx1 , . . . , xN Þ ðI P" ðx ÞÞ½ j for every j ¼ 1, . . . , N: ðx1 , . . . , xN Þ ðI P" ðx ÞÞ½ j ¼

N X

xi ðij P" ðx Þ½i, jÞ

i¼1

¼

X i2 = S0

¼

X

xi

ðij p" ð jji, aÞÞ

a2AðiÞ

N X X

xia X X 1 þ xi ðij p" ð jji, aÞÞ mðiÞ xi i2S a2AðiÞ 0

ðij p" ð jji, aÞÞxia ¼ 0:

i¼1 a2AðiÞ

Thus, x :¼ ðx1 , . . . , xN Þ is the stationary distribution vector for x , hence by (2) it follows that xi > 0, 8 i ¼ 1, . . . , N: This means that, in fact, S0 ¼ 6 0 and x ði, aÞ ¼

xia xi

8 i ¼ 1, . . . , N,

and, therefore, M is a one-one map with the inverse M 1 ðx Þ :¼ x . It is worth noting that the invertibility of M is one of the main benefits of using the more complex second order perturbation introduced in this article. Under the first order perturbation used in [4] it is possible to find points x 2 X such that xi ¼ 0 for possibly many nodes i, corresponding to transient states. Such points tell us nothing about which arcs (if any) are candidates for inclusion in a possible Hamiltonian cycle. However, now every xi > 0 (because the corresponding MC is irreducible) and the relative sizes of the contributing xia s could be used as indicators for identifying ‘‘likely’’ arcs in Hamiltonian cycles.

3

MAIN RESULT

Consider the following two functions on the frequency space X supplied with the standard Euclidean norm k k: sðxÞ :¼

N X X

x2ia ¼ kxk2 ,

ð5Þ

i¼1 a2AðiÞ

and

SðxÞ ¼

N X X i¼1

a2AðiÞ

!2 xia

¼

N X

x2i :

i¼1

Since all xia 0 for x 2 F " , it is clear that SðxÞ sðxÞ on F " and the equality takes place if and only if x ¼ M 1 ðxÞ is a deterministic policy (see [7]). Let GHC be a graph consisting of just a single Hamiltonian Cycle on node set N and fHC be the (only) deterministic policy in ðGHC Þ corresponding to this cycle. Set xHC :¼ Mð fHC Þ and, suppressing the dependence on ", we define z2 :¼ SðxHC Þ ¼ sðxHC Þ:


449

It is easy to verify that z2 is independent of the choice of the particular HC. Consider the convex subset " F " defined by " ¼ x 2 F " j SðxÞ z2 THEOREM 1

There exists "0 > 0 such that for any " 2 ð0, "0 Þ

(i) if " ¼ 6 0, then G is non-Hamiltonian (ii) if maxx2" sðxÞ < z2 , then G is also non-Hamiltonian (iii) if maxx2" sðxÞ ¼ z2 , then G contains a Hamiltonian cycle fHC :¼ M 1 ðx0 Þ, where x0 is any point, where max" sðxÞ is attained. For the sake of continuity of discussion the proof of this theorem is postponed till Section 4. It is clear that max sðxÞ max SðxÞ z2 , x2"

x2"

so Theorem 1 provides a criterion for the existence of an HC in G. This criterion has an obvious geometric interpretation due to the fact that sðxÞ ¼ kxk2 : In the Appendix we provide simple examples where items (i)–(iii) of the Theorem 1 are realized as well as the closed form of z2 as a function of the perturbation parameter " and the number of nodes N of the graph. At this stage we do not have an estimate of "0 and we do not have a (geometric) characterization of all graphs satisfying conditions (i)–(iii) of Theorem 1, even though such a characterization is clearly desirable. This is a subject for future work. COROLLARY 1 Let Bð0, zm Þ be a ball of radius zm centered at 0. Graph G contains an HC if and only if for sufficiently small ", the closed convex subset " of Bð0, zm Þ extends to the boundary of Bð0, zm Þ. Or, in other words, G does not contain an HC iff " is contained in some smaller ball Bð0, z0m Þ, where z0m < zm for sufficiently small ". While it could be argued that Theorem 1 merely replaces one difficult problem with another one, there is now evidence that nonconvex quadratic programs arising from embeddings in MDPs can lead to useful numerical algorithms. This line of work was initiated in Andramonov et al. [1] and was continued in [2]. More generally, Murray and Ng [9] report considerable success in converting 0–1 integer programs into nonconvex nonlinear programs which are then solved with interior-point heuristics. Thus the conventional wisdom that integer programs are ‘‘easier’’ than nonconvex continuous programs may no longer apply, especially when there is special structure that can be exploited.

4

S-FUNCTION AS A MEASURE OF BALANCE ON D

We use the partition of deterministic policies D as in [6]. As above, with each f 2 D we associate a subgraph Gf of G defined by ði, jÞ 2 Gf , f ðiÞ ¼ j:

450

V. EJOV et al.

We shall also denote a simple cycle (without self-intersections) of length k beginning at 1 by a set of arcs c1k ¼ ði1 ¼ 1, i2 Þ, ði2 , i3 Þ, . . . , ðik , ik þ 1 ¼ 1Þ ,

k ¼ 2, 3: . . . , N:

Note that c1N is a HC. If Gf contains a cycle c1k , we write Gf c1k . Let D1k :¼

f 2 Dj Gf c1k ,

namely, the set of deterministic policies that trace out a simple cycle of length k, beginning at node 1, for each k ¼ 2, 3, . . . , N: Of course, D1N is the (possibly empty) set of policies that correspond to HCs and any single D1k can be empty depending on the structure of the original graph G. The B class of deterministic policies is defined by

B :¼ D

" N / [

# D1m

:

m¼2

The B class is characterized by the property that for each f 2 B the node where f returns for the first time (starting from the home node 1) is different from the home node. What will matter, is the size k of the first cycle appearing in Gf, that begins at i‘þ1 , continues to i‘þk and returns to i‘þ1 node. Therefore, B can also be partitioned into B¼

N1 [

Bk

k¼2

with respect to the size of the first appearing cycle in f. The technical part of the proof of Theorem 1 is based on the following result: LEMMA 1 Let A ¼ ðaij Þ, aii ¼ 0, i ¼ 1, . . . , m, be a square matrix with {0, 1} entries, and, with, at most, one unit entry per row. Then (i) there exists a permutation matrix T such that T 1 ðI tAÞT has the ‘‘quasi uppertriangular’’ form 2

1

60 6 6 60 6 60 6 60 6 60 6 6 60 6 60 6 40 0

3 ..

7 7 7 7 7 7 7 7 7, 7 7 7 7 7 7 7 5

.

0

1 0

A1 0

A2

..

0

. 0

0

As 0

1

.. 0

. 1


451

where symbolizes the ‘‘immaterial ’’ part of T 1 ðI tAÞT and each block Aj is a square rj rj matrix of the form 2 3 1 t 0 0 6 7 6 0 1 t 0 7 6 7 6 7 6 7 4 5 0

t

0

1

(ii) detðI tAÞ ¼ sj¼1 ð1 trj Þ (iii) ðI tAÞ1 ¼ TQT 1 , where Q has a ‘‘quasi upper-triangular’’ form similar to T 1 ðI tAÞT with blocks Aj respectively substituted by the blocks 2

1

6 r 1 j 1 6 6t Qj ¼ 6 1 tr j 6 4 trj 1 (iv)

t

t2

1

t

trj 2

trj 3

trj 1

3

7 trj 2 7 7 7 7 5 1

ðI tAÞ1 ¼ Oð"Þ for t ¼ 1 ":

Proof of Lemma 1 (Induction on the size of A) We start with det A. Assume that Lemma 2 is satisfied for any A of the size k < n. Let A be a square n n matrix. If A contains a jth row consisting of f0g entries only, then, setting Tjn ¼ Tnj :¼ 1, we make this row last in T 1 AT . Matrix A~ (which is A with jth row and jth column removed) satisfies the same conditions as A and has the size ðn 1Þ ðn 1Þ: If we expand the determinant of ðI tAÞ w.r.t. row j, we complete the proof by induction. If A contains a column with, at least, two f1g-entries, then there must be another column in A with all f0g-entries (otherwise M contains ðn þ 1Þ units or more). If A contains a column jth, consisting of only f0g, then, setting Tj1 ¼ T1j :¼ 1 and expanding the determinant w.r.t. first column, we again reduce the problem to a lower dimensional one. Thus, it remains to consider the situation when M has exactly one unit element per every row and every column, i.e. A corresponds to a permutation 2 Sn . Let ¼ 1 , 2 , . . . , s be the decomposition of into independent cycles of sizes r1 , . . . , rs , respectively. Then, there exists a permutation n n matrix T, such that T 1 AT is a block diagonal matrix with the blocks of the off-diagonal form 3 2 0 1 0 0 7 6 1 0 0 7 60 0 7 6 7 6 6 0 0 . . . 7: 7 6 7 6 60 0 1 7 5 4 1

0

0

452

V. EJOV et al.

Hence, det ðI tAÞ ¼ sj¼1 ð1 trj Þ: Having the determinant in the above form, we conclude, that T 1 AT contains a submatrix consisting of s blocks of the above form, and, besides, several 0-rows, that we collect as last rows of T 1 AT and, also, several 0-columns, that we make the first columns of T 1 AT by appropriate choice of T. This proves (ii). The form of ðI tAÞ1 follows from the form of ðI tAÞ: The estimate (iv) follows from the form (iii) and the fact that 1 trj ¼ ð"Þ for t ¼ 1 ": g The following lemma demonstrates that the S-function is well designed to distinguish between HC and other deterministic policies: LEMMA 2 (i) (ii) (iii)

if f ¼ fHC then SðxHC Þ ¼ ð1=NÞ þ Oð"Þ for xHC ¼ Mð fHC Þ if f 2 D1k , k < N, then Sðxf Þ ¼ ð1=kÞ þ Oð"Þ for xf ¼ Mð f Þ if f 2 Bk then Sðxf Þ ¼ ð1=kÞ þ Oð"Þ for xf ¼ Mð f Þ:

Proof of Lemma 2 1.

Let fHC correspond to the HC: 1 ! 2 ! 3 ! ! N ! 1: Then 2

0

6 6 " 6 6 6 " 6 P" ð fHC Þ ¼ 6 6 6... 6 6 6 " 4 1

1 ðN 2Þ

...

0

t

0

...

0

0

t

...

...

... ... ...

0

0

0

...

0

0

0

...

3

7 0 7 7 7 0 7 7 7, 7 ...7 7 7 t 7 5 0

where t ¼ 1 " and ¼ "2 : For the solution P ð fHC Þ of (1), we observe p2 ¼ ð1 ðN 2ÞÞp1 ¼ p1 þ Oð"2 Þ p3 ¼ p1 þ ð1 "Þp2 ¼ p2 þ Oð"Þ ¼ p1 þ Oð"Þ Assuming that for j < k N, pj ¼ p1 þ Oð"Þ, we derive pk ¼ p1 þ ð1 "Þpk1 ¼ pk1 þ Oð"Þ ¼ p1 þ Oð"Þ:


453

Thus, pj ¼ p1 þ Oð"Þ for j ¼ 2, . . . , N and the condition ð1=NÞ þ Oð"Þ, j ¼ 1, . . . , N: 2.

P

j

pj ¼ 1 implies pj ¼

Let f 2 D1k of the form 1 ! 2 ! 3 ! , . . . , ! k ! 1, . . . so that 2

0

6 6 " 6 6 6 6 6 P" ð f Þ ¼ 6 6 " 6 6 1 6 6 6 4

1 ðN 2Þ

0

t

0

0

0

0

0

0

0

t

0

0

0

0

0

0

U

3

7 07 7 7 07 7 7 07 7 7 07 7 7 7 5

tANk

where t ¼ 1 ", ANk is a square ðN kÞ ðN kÞ matrix satisfying the conditions of Lemma 1, i.e. it consists of f0,1g entries with at most one unit entry per row and U is some rectangular ðN kÞ k matrix. For p~ :¼ ðpkþ1 , pkþ2 , . . . , pN Þ and e ¼ ð1, . . . , 1Þ the system (1) yields p~ ¼ p1 e þ tAnk p~ , or, ðI tANk Þp~ ¼ p1 e: Hence, p~ ¼ p1 ðI tANk Þ1 e: According to Lemma 1, ðI tANk Þ ¼ TQT 1 for some permutation matrix T: Note that ¼ Oð"2 Þ and that it follows from Lemma 1 that ðI tANk Þ1 will not reduce the order by more than one power of ": Thus, p~ ¼ p1 TQT 1 e ¼ Oð"Þ: For the remaining components p1 , . . . , pk of P ð f Þ we are in the situation of an HC of the length k, hence,

pj ¼

1 þ Oð"Þ, k

j ¼ 1, . . . , k:

454

3.

V. EJOV et al.

Let f 2 Bk correspond to the policy

1 ! 2 ! . . . ! ‘ ! ‘ þ 1 ! ‘ þ 2 ! . . . ! ‘ þ k ! ‘ þ 1, . . . , so 3 2 0 1 ðN 2Þ . . . . . . ... 1 7 6 0 ... 0 0 0 ... 0 0 ... 0 7 2 6 " 7 6 6 7 7 6 7 6 6 0 ... 0 t 0 ... 0 0 ... 0 7 ‘ 7 6 " 7 6 0 ... 0 t 0 ... 0 0 ... 0 7 ‘þ1 6 " 7, 6 P" ð f Þ ¼ 6 7 7 6 7 6 6 0 ... 0 0 0 ... t 0 ... 0 7 ‘ þ k 16 " 7 7 6 0 ... 0 t 0 ... 0 0 ... 0 7 ‘þk 6 " 7 6 7 6 5 4 U1 U2 tANð‘þkÞ where U1 is an ðN ð‘ þ kÞÞ ‘ matrix, U2 is an ðN ð‘ þ kÞÞ k matrix and ANð‘þkÞ is a square ðN ð‘ þ kÞÞ ðN ð‘ þ kÞÞ matrix as in Lemma 1. As above, for p~ ¼ ðp‘þkþ1 , . . . , p~ N Þ we observe: p~ ¼ p1 TQT 1 e ¼ Oð"Þ

for ¼ "2

as before. For p1 , . . . , pe we deduce from (1) p1 ¼ "

‘þk X

pj þ Oð"Þ ¼ Oð"Þ,

j¼2

and pj ¼ Oð"Þ,

j ¼ 1, . . . , ‘:

from Oð"Þ the contribution of p1 , . . . , p‘ and p‘þkþ1 , . . . , pN in Apart p‘þ1 , . . . , p‘þk , these components imitate a HC of the length k. As above, this implies: pj ¼

1 þ Oð"Þ: k

g COROLLARY 2

For k < N and " sufficiently small Sð fHC Þ < Sð f Þ

for any f 2 D1k or f 2 Bk .


455

Intuitively we assume that HC is the most ‘‘balanced’’ deterministic policy. Hence, the S-function can be considered as the measure of the balance in the frequency space X, that distinguishes between the policies f 2 D: Proof of Theorem 1

If xHC 2 F " , then sðxHC Þ ¼ SðxHC Þ ¼ z2

that proves (i) and (ii). Since sðxÞ SðxÞ z2 for x 2 F " , it follows that sðxÞ z2 on " : Suppose that maxx2 sðxÞ ¼ z2 and this maximum is attained at x0 2 F . As z2 ¼ sðx0 Þ Sðx0 Þ z2 , then Sðx0 Þ ¼ z2 : Since, sðx0 Þ ¼ Sðx0 Þ, then fx0 ¼ M 1 ðx0 Þ is a deterministic policy according to [6]. By Lemma 2, Sðx0 Þ ¼ z2 if and only if fx0 ¼ HC. g Acknowledgments This research was supported in part by the ARC Discovery Grant Nos. A00000767 and DP0343028. We are also indebted to an anonymous referee for many helpful suggestions. References [1] M. Andramonov, J.A. Filar, A. Rubinov and P. Pardalos (2000). Hamiltonian cycle problem via Markov chains and min-type approaches. In: P.M. Pardalos (Ed.), Approximation and Complexity in Numerical Optimization: Continuous and Discrete Problems, pp. 31–47. Kluwer Academic Publishers, Dordrecht, The Netherlands. [2] J. Filar, J. Gondzio and V. Ejov (2003). An interior point heuristic for the Hamiltonian cycle problem via Markov decision processes. Journal of Global Optimization (to appear). [3] E. Feinberg (2000). Constrained discounted Markov decision process with Hamiltonian cycles. Mathematics of Operations Research, 25, 130–140. [4] J.A. Filar and D. Krass (1994). Hamiltonian cycles and Markov chains. Mathematics of Operations Research, 19, 223–237. [5] J.A. Filar and J-B. Lasserre (2000). A non-standard branch and bound method for the Hamiltonian cycle problem. ANZIAM J., 42(E), C556–C577. [6] J. Filar and K. Vrieze (1996). Competitive Markov Decision Processes. Springer, New York. [7] L.C.M Kallenberg (1983). Linear programming and finite Markovian control problems. Mathematical Centre Tract , Vol. 148. Mathematisch Centrum, Amsterdam. [8] R. Karp (1977). Probabilistic analysis of partitioning algorithms for the travelling-salesman problem in the plane. Mathematics of Operations Research, 2(3), 209–224. [9] W. Murray and K.-M. Ng (2003). An algorithm for nonlinear optimization problems with discrete variables. Submitted to Mathematical Programming. [10] I. Parberry (1997). An efficient algorithm for the Knight’s tour problem. Discrete Applied Mathematics, 73, 251–260. [11] R. Robinson and N. Wormald (1994). Almost all regular graphs are Hamiltonian. Random Structures and Algorithms, 5(2), 363–374. [12] R.J. Wilson (1996). Introduction to Graph Theory. Longman, Harlow.

APPENDIX As in Lemma 2 denote by pHC :¼ ð p1 , . . . , pN Þ the stationary distribution vector of the policy P" ð fHC Þ of the perturbed policy induced by a Hamiltonian cycle. Vector pHC is

456

V. EJOV et al.

the row of the solution of (1) for the policy P" ð fHC Þ: Following the proof of the part (i) of Lemma 2 with exact computation of pj for j ¼ 1, . . . , N one can show that, in fact, the first component p1 ¼

1"þ

"2

"3

ð1 "Þ" ð1 "ÞN ð1 " ðN 2Þ"2 Þ

ð6Þ

and any latter, jth component, pj

¼

" þ ð1 "Þ j2 1 " ðN 2Þ"2 ð1 "Þ"

ð7Þ

1 " þ "2 "3 ð1 "ÞN ð1 " ðN 2Þ"2 Þ

for j ¼ 2, . . . , N: Therefore, sð pHC Þ ¼ Sð pHC Þ ¼ z2 sums to z2 ¼ 2 ðN 2Þ"2 þ ðN 2Þ2 "4 2 ð1 "Þ ð1 "ÞN2 1 1 " ðN 2Þ"2 2 ! ð1 "Þ2 ð1 "Þ2 N4 1 1 " ðN 2Þ"2 " ð2 "Þ

1"þ

"2

"3

ð1 "Þ" ð1 "ÞN ð1 " ðN 2Þ"2 Þ

The graphs below provide examples that realize parts (i)–(iii) of the Theorem 1. Let N ¼ 4 and 1 be the graph with the same notation G1 used for its adjacency matrix 2 3 0 1 0 0 6 7 61 0 0 07 6 7 1 ¼ 6 7: 60 0 0 17 4 5 0 0 1 0 Graph 1 consists of the only deterministic policy p which is a union of two cycles each of the length two. By straightforward computation, Sð pÞ ¼ sð pÞ ¼ ð1=2Þ

1 "2 þ 2 "4

ð" 1 þ "2 Þ

2

,

that is larger for " < 0:5 than z2 ¼

4 þ "2 14 "4 þ 8 "3 þ 10 "5 6 " þ 9 "6 12 "7 þ 4 "8 ð3 " 4 þ 2 "2 5 "3 þ 2 "4 Þ2

Hence, " ¼ 6 0 for 1 , that realizes case (i) of the Theorem 1.

:


457

Let N ¼ 6 and 2 be the graph with the adjacency matrix 2

0

6 61 6 6 60 6 2 ¼ 6 60 6 6 61 4 0

0 1

0

0

0 1

0

0

1 0

0

0

0 0

0

1

0 0

1

0

0 0

1

1

1

3

7 07 7 7 07 7 7: 17 7 7 07 5 0

Graph 2 contains two deterministic policies, namely 2

0

6 61 6 6 60 6 P1 ¼ 6 60 6 6 60 4 0

0

1 0

0

0

0 0

0

1

0 0

0

0

0 0

0

0

0 1

0

0

0 0

1

0

3

7 07 7 7 07 7 7, 17 7 7 07 5 0

which is a union of two cycles, and of a policy P2 2 B5 : 2

0 0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

1

0

0

0

0

0 0

0

1

0

6 60 6 6 60 6 P2 ¼ 6 60 6 6 61 4

1

3

7 07 7 7 07 7 7: 07 7 7 07 5 0

Set ¼ 0:4 in the convex combination P :¼ P1 þ ð1 ÞP2 : Thus, 2

0

1:0 "2

6 6 6 0:4 þ 0:6 " 0 6 6 6 6 1:0 " 1:0 1:0 " 6 P¼0:4 ¼ 6 6 6 1:0 " 0 6 6 6 6 0:4 " þ 0:6 0 6 4 1:0 "

0

0:4 1:0 "2

1:0 "2

1:0 "2

0:6 0:6 "

0

0

0

0

0

0

0

0:6 0:6 "

0

0:4 0:4 "

0

0

0:6 0:6 " 0:4 0:4 "

2:0 "2 þ 0:6

3

7 7 7 7 7 7 7 0 7 7, 7 0:4 0:4 " 7 7 7 7 7 0 7 5 0

0

458

V. EJOV et al.

and SðP¼0:4 Þ ¼

1 ð1656:0 "2

2

þ 684:0 þ 1884:0 " 1153:0 "5 þ 2132:0 "4 837:0 "3 þ 196:0 "6 Þ

ð77976:0 11796:0 "4 1815690:0 "5 þ 164176:0 "12 1529928:0 "11 þ 4980629:0 "10 þ 429552:0 " þ 375840:0 "2 þ 1929051:0 "6 þ 7329964:0 "7 3768610:0 "8 498960:0 "3 5193454:0 "9 , that is, for " < 0:1 is less than z2 ¼

6 134 "4 þ 338 "5 þ 694 "8 232 "7 706 "9 þ 16 "12 120 "11 251 "6 20 " þ 30 "2 þ 393 "10 2 6 þ 10 " þ 34 "4 19 "5 þ 4 "6 25 "3

So, " contains the policy P¼0:4 and, hence, case (ii) of the Theorem 1 is realized on 2 : Part (iii) is realized on any Hamiltonian graph, for example, on a graph with single deterministic policy, that is a Hamiltonian Cycle.