DUALITY AND THE COMPUTATION OF APPROXIMATE INVARIANT

0 downloads 0 Views 320KB Size Report
of invariant densities for a nonsingular and measurable transformation T ... a density function f ∈ Lp(X; µ) (usually p = 1)? A measure ν is T–invariant, or an ..... µ be Lebesgue measure and let Hn = {x, x2,...xn}. In Ding [6] this generating set is used, along with the entropy objective H to derive approximately invariant densities.
DUALITY AND THE COMPUTATION OF APPROXIMATE INVARIANT DENSITIES FOR NONSINGULAR TRANSFORMATIONS∗ CHRISTOPHER J BOSE† AND RUA MURRAY‡ Abstract. We investigate a class of optimization problems which arise in the approximation of invariant densities for a nonsingular and measurable transformation T acting on a finite measure space. The problems under consideration have convex integral-type objectives and finite moment constraints and include, for example, the maximum entropy end quadratic programming approaches previously studied in the literature. This article is a natural sequel to those investigations and to the paper [5] where a general class of convergent moment approximations were defined such that the limiting optimal solution is an invariant density for T . This article mainly concerns the solution of a single finite-moment problem arising from this general approximation scheme. Both theoretical aspects and computational issues are treated. Although the problem fits easily into the standard theory of duality in convex optimization, its dynamical origins lead to technical obstructions in the derivation of optimality conditions. In particular, the dual functional for our problem is neither strictly convex nor coercive, relating in part to the fact that the moment generating functions for the approximation scheme need not be pseudo–Haar. The method of the paper circumvents these obstructions and yields an unexpected benefit: each finite-moment approximation leads to rigorous bounds on the support of all invariant densities for T . Key words. invariant measure, Frobenius-Perron operator, entropy-like objective, moment constraint, strong duality AMS subject classifications. Primary 28D05 Secondary 37M25, 41A46, 49K27

1. Introduction. Let X = (X; B, µ) be a Borel measure space. When T : X → X is measurable and non-singular with respect to µ, (T, X) is a dynamical system, and we are motivated by the question: can one find a T –invariant probability measure with a density function f ∈ Lp (X; µ) (usually p = 1)? A measure ν is T –invariant, or an invariant measure if ν = ν ◦ T −1 . Invariant measures determine equilibrium statistics of the dynamical system (T, X) (via Birkhoff’s Ergodic Theorem) and those with densities do so for a µ–nontrivial set of orbits. T –invariant measures with densities are absolutely continuous invariant measures (ACIM). Usually, they cannot be found in closed form, and it is highly desirable to develop computational strategies for approximating them. In [5] we studied a class of convex optimization problems on classical Banach spaces whose solutions robustly approximate T –invariant densities: the solutions {fn } to appropriately chosen sequences of optimization problems (Pn ) converge (in Lp ) to a T –invariant density. In the current paper we investigate some of the technical issues that arise in solving such (Pn ), as well as providing complete ∗ Both authors would like to thank the Dept of Mathematics and Statistics, University of Victoria and the Dept of Mathematics, University of Waikato for hospitality during the period when this research was conducted. † Dept of Mathematics and Statistics, University of Victoria, P.O. Box 3045 STN CSC, Victoria, BC, CANADA V8W 3P4 This research was supported by a grant from the Natural Sciences and Engineering Research Council of Canada. ‡ Dept of Mathematics, University of Waikato, Private Bag 3015 Hamilton, NEW ZEALAND.

1

2

C BOSE and R MURRAY

and explicit solutions for some special cases. The optimization problems have the general form Z Minimize Φ(f ) = φ(f (x)) dµ(x) X p

Subject to f ∈ L (X; µ) and M f = b ∈ RN +1

(Pn )

(hereafter we denote Lp (X; µ) = Lp ). The constraint M : Lp → RN +1 is of moment type; i.e. is defined with respect to a given finite collection1 g0 , g1 , . . . gN of moment test functions in Lq (X) and Z (M f )i = f gi dµ, i = 0, 1, . . . N. For each n, the vector b is fixed and q is the conjugate index: q = ∞). This setup generalizes a study of Ding [6].

1 p

+ 1q = 1 (when p = 1,

Our aim is to develop numerical algorithms for invariant density approximations which work in practice: the methods must produce a convergent sequence of approximately invariant densities, and each iteration must involve the computation of a well-defined function, which can be performed on a computer. Our optimization based programme requires: 1. A suitable choice of generating functions (in Lq ) such that any limit as n → ∞ of solutions to (Pn ) is an invariant density of the dynamical system (T, X); the dynamics of T are thus “encoded into M ”. 2. A choice of Φ which ensures norm convergence of the solutions of (Pn ) as n → ∞. 3. Any refinements needed to ensure that the solution of (Pn ) can be reduced to the solution of a finite number of algebraic equations. 4. Application of the method to specific examples to produce a convergent sequence of approximately invariant densities. Each of these steps lead to nontrivial considerations. Most of (1) is addressed in [5], where we refer the interested reader. The main requirement is that the moment test functions are derived from a sequence whose span is weak∗ –dense in Lq ; the necessary details are given in Section 2. The requirements of (2) can be addressed with standard results from the literature [1, 3, 12, 11], and some details are collected in Section 2. Several example formulations are also presented in Section 2. The main effort in this paper is directed towards (3): establishing conditions which allow the primal optimization problems (Pn ) to be solved on a computer. Since each (Pn ) is convex, it is natural to write down the Lagrangian, and pass to a dual (or conjugate) optimization problem, obtaining a concave, finite-dimensional and unconstrained problem (Dn ). While (Dn ) is derived easily using standard methods, for a large (and reasonable) choice of moment formulations of the invariant measure problem, the dual objective function is non-coercive2 . This leads to difficulty in the derivation of necessary and sufficient optimality conditions, and possibly to failure of dual attainment (with consequent impediments in the practical solution of the optimization problems). We have identified two mechanisms leading to non-coercivity 1 Note 2 That

that N need not equal n in all applications. is, has unbounded (upper) level sets; see [2].

DUALITY AND INVARIANT MEASURES

3

of (Dn ): (i) the moment test functions defining the constraint operators M may not be pseudo–Haar 3 leading to unbounded contours (in fact hyperplanes) in the objective of (Dn ); (ii) regions of X which are transient under the dynamics of T can prevent the dual problem (Dn ) from attaining its maximum at all. Our main result (Theorem 3.3) is a condition which ensures dual attainment, leading to necessary and sufficient optimality conditions for the solution (Theorem 3.6). In the remainder of Section 3 we develop a domain restriction which guarantees that the conditions in the theorems are met. Despite the ad-hoc appearance of the restriction, it is intimately connected with the dynamics of T , and its imposition does not alter the solution of the underlying invariant measure problem. Moreover, the restriction yields useful dynamical information (see Lemma 3.7 (2)) which is not normally revealed by other methods for invariant density approximation (for example, Ulam’s method [15, 9, 7, 10, 5]). In Section 4 we present several examples, illustrating how non-coercivity of the dual problems arises and how it is dealt with. In particular, we show how to accomplish the domain restriction for an ‘Entropy method’ with simple moment test functions. 2. Optimization formulation of the invariant measure problem. The T – invariance condition for densities can be encoded into a sequence of constraint operators M for problems (Pn ), and the optimization of Φ provides a convenient method of selecting a convergent sequence of approximately invariant measures. We now give a brief discussion of the dynamical origins of (Pn ); further detail and discussion about connections between this and other methods for invariant measure approximation may be found in [5]. 2.1. Encoding dynamics as moment constraints. Let (T, X) be a dynamical system. When there is no possibility of confusion we write integration with respect to µ as “dx”. A σ–finite Borel measure ν is absolutely continuous if it has the form dν = f dµ for some measurable function4 f . We write ν  µ for absolute continuity dν . We also assume that T is non-singular : µ ◦ T −1  µ, from which and f = dµ one can quickly deduce that ν ◦ T −1  µ whenever ν  µ. As noted above, it is particularly interesting to find absolutely continuous probability measures which are dν invariant under T . For such a ν, f = dµ is called an invariant density. Invariant densities can be investigated via a transfer operator on Lp (usually, p = 1). If dν = f dµ then ν ◦ T −1  µ so there is a function fˆ satisfying dν ◦ T −1 = fˆdµ. −1 def ) Thus, T induces an action P on L1 by P f = fˆ = d(ν◦T . The operator P is linear, dµ positive (in the sense that f ≥ 0 implies P f ≥ 0) and preserves integrals. P is called the Frobenius–Perron operator associated to T . Invariant densities 0 ≤ f ∈ L1 are fixed points of P . An alternative5 characterization of P is Z

Z P f hdµ =

3 That

f h ◦ T dµ,

for all f ∈ L1 , h ∈ L∞ ,

(2.1)

is, may not be linearly independent µ–almost everywhere. we require f ≥ 0, although signed absolutely continuous measures also make sense in this context. The usual definition of absolute continuity is µ(B) = 0 ⇒ ν(B) = 0; the equivalence of the two is part of the Lebesgue-Radon-Nikodym Theorem. R R 5 If B is a measurable set then P f 1BR dµ = ν(T −1 B) = f 1B ◦ T dµ. For any simple h, R linearity of the integral gives P f h dµ = f h ◦ T dµ, and the general case follows since simple functions are dense in L∞ . 4 Normally

4

C BOSE and R MURRAY

from which Z P f = f if and only if

f (h ◦ T − h) dµ = 0 for all h ∈ L∞ .

(2.2)

In view of (2.2), it is natural to express the invariant density condition via a sequence of moment approximations. Suppose H = {h1 , h2 , . . . hN } ⊆ L∞ is a finite collection of functions. We say that f is approximately invariant up to H if Z f (hi ◦ T − hi ) dx = 0, i = 1, 2, . . . N. (2.3) Setting gi = hi ◦ T − hi and6 g0 = 1, we define the set of approximately invariant functions to be the feasible set for (Pn ): R   R F = f ∈ L1 f g0 dx = 1, f gi dx = 0, i = 1, 2, . . . N = f ∈ L1 | M f = b R where M : L1 → RN +1 is defined by (M f )i = f gi dµ and b = (1, 0, . . . , 0). (Pn ) will be called feasible if F = 6 ∅, and each f ∈ F is feasible for (Pn ). We call the collection G = {g0 , g1 , g2 , . . . gN } the set of moment test functions and H the set of generating functions for the approximation. Notice that we do not explicitly include the nonnegativity constraint f ≥ 0 in the definition of the feasible set; we prefer to impose this, when desired, using the objective function Φ. (In [5] we present the case of Φ(f ) = 1 2 2 kf kL2 where allowing f to assume negative values is convenient). The function g0 ensures that the approximate invariant densities are normalized7 . Note that each (Pn ) has its own N , feasible set, moment test functions and generating functions. When there is any possibility of ambiguity, these will be denoted N (n), Fn , Gn , Hn (respectively). Finally, we make the standing assumption that µ(X) < ∞. Then, in the definition of F, the function space L1 can be replaced by Lp , 1 < p < ∞. This is due to the fact that Lp ⊆ L1 , 1 ≤ p < ∞ and L∞ ⊆ Lq , 1 ≤ q ≤ ∞, so all the integrals in Fn remain well-defined. This allows us to consider a range of objectives Φ (such as H, V and V + defined below). Convergence of approximately invariant densities. The T application of the problems (Pn ) relies on the constraints being such that F∞ = n≥1 Fn consists precisely of T –invariant densities. This condition is certainly satisfied when there ∗ exists an Lp T –invariant density, H∞ = {hi }∞ i=1 is a sequence whose span is weak – dense in Lq and Hn = {h1 , . . . , hn }. In this situation, the sets {Fn } are nested (Fn ⊆ Fm whenever n ≥ m), so  R R F∞ = f f dµ = 1, (P f − f ) h dµ = 0 for all h ∈ span(H∞ ) , guaranteeing the condition in (2.2). (See [13] for generalizations of the Lq weak∗ – density condition.) The nested condition on Fn and density condition on {Hn } can be weakened8 , allowing other reasonable choices of {Hn }. The role of the objective functional Φ is to specify a selection of fn ∈ Fn such that limn→∞ fn exists and is in F∞ . 61 B denotes the 7 This constraint

characteristic function of the measurable subset B and 1 = 1X . also eliminates the trivial solution f = 0 from (Pn ). 8 In [5, Section 3] we establish a suitable convergence result under “lattice” and “weak eventual clustering” conditions.

DUALITY AND INVARIANT MEASURES

5

2.2. Choice of convex functional. The objective functionals Φ are chosen for mathematical and practical convenience. Let µ(X) < ∞ and φ : X → R ∪ {∞} be proper [11], lower semicontinuous and strictly convex. Define Z Φ(f ) = φ(f (x)) dx. (So φ is a normal convex integrand in the sense of Rockafellar [11].) We require Φ to be strictly convex, weakly lower semicontinuous, have weakly compact lower level sets and the Kadec property [3]: if Φ(fn ) → Φ(f ) and fn → f weakly as n → ∞ then kf − fn kLp → 0. Remarks. 1. The function φ(f ) need not be integrable for every f ∈ Lp , however we assume (φ(f ))− (its negative part) is integrable, and consequently Φ(f ) = R φ(f ) dx is unambiguously an element of (−∞, ∞]. For the examples below, this assumption holds. 2. Provided there is an invariant density f∗ for T with f∗ ∈ Lp and Φ(f∗ ) < ∞, (Pn ) will be feasible for every choice of generators H. Natural choices for Φ. In [5] we studied the following choices for φ : R → R   t log t for t > 0 def φ(t) = η(t) = 0 if t = 0   +∞ if t < 0 after which we adopt the standard notation Φ = H, the (negative) BoltzmannShannon entropy on L1 . Notice that H(f ) < ∞ implies that f ≥ 0 µ–almost everywhere. If def

φ(t) = v(t) =

1 2 t 2

we optimize with respect to an ‘Energy’ functional which we denote by Φ = V , and when ( 1 2 t if t ≥ 0 def φ(t) = v+ (t) = 2 +∞ if t < 0 we get a positively constrained Energy functional, denoted Φ = V+ . Of course the appropriate Banach space domains for the energy functionals V and V+ would be L2 . Properties of these and other “Entropy-like” functionals are investigated in many papers, for example [3, 2, 12]. 2.3. Three example setups. Partition generating functions with ‘Energy’ objective. Let P be a partition of X into measurable subsets P = {Bi }ni=1 , and let H = {1Bi }ni=1 be the set of generating functions (we call this a partition basis). The moment test functions are therefore gi = 1Bi ◦ T − 1Bi = 1T −1 Bi − 1Bi

6

C BOSE and R MURRAY

and the approximately invariant densities can be considered as “invariant up to the discretization imposed by P”. Let Φ(f ) = 21 kf k2L2 . This example is studied in detail in [5], and leads to a convergent invariant density approximation scheme under the 2 assumption that T admits an invariant density Pn Pn in L . The main complication that arises P with this method is that i=1 gi = i=1 (hi ◦ T − hi ) = 1X ◦ T − 1X = 0 Pn n since h = 1 = 1∪Bi = 1X . Consequently, the M ∗ (defined below) i B i i=1 i=1 has non-trivial kernel, leading to non-coercivity of the dual problem (Dn ) (see Section 3.2). This is dealt with easily, both analytically (Section 3.2), and numerically [5, Remark 7.1]. Partition generating functions with ‘Entropy’ objective. RAgain one uses aR partition basis H = {1B1 , 1B2 , . . . 1Bn }, but now Φ(f ) = H(f ) = X η(f (x)) dx = f (x) log f (x) dx. For many interesting densities f (for example, the invariant X densities for the logistic family of maps on [0, 1]), H(f ) < ∞ while V (f ) = ∞. So, one gains applicability with this choice of Φ, but at cost: the dual optimization problem is potentially less tractable. In fact, the dual problem can suffer from noncoercivity, wherein the optimizer occurs “at infinity”. In Section 3.4 we elaborate and resolve these difficulties by restricting the domain of integration in (Pn ). Polynomial basis functions with ‘Entropy’ objective. Let X = [0, 1] ⊆ R, µ be Lebesgue measure and let Hn = {x, x2 , . . . xn }. In Ding [6] this generating set is used, along with the entropy objective H to derive approximately invariant densities under the following dynamical assumptions (D1) The moment test functions gi (x) = (T x)i − xi , i = 1, 2 . . . n are linearly independent; and (D2) T admits a unique invariant density f∗ and that further, this density satisfies f∗ > 0 and H(f∗ ) < ∞. In [4] we show, using techniques derived later in this article, that Ding’s method can be extended to dynamical systems satisfying only (D3) T admits an invariant density f∗ with H(f∗ ) < ∞ such that T is not of finite order with respect to f∗ dµ. (That is, there is no n > 0 so that T n = id, f∗ dµ a.e.). 3. Main results. 3.1. The dual problem (Dn ). Since φ is a normal convex integrand in the sense of Rockafellar [11], we have a simple closed form for the dual functional, which we denote by Q Z Maximize Q(λ) = hλ, bi −

φ∗ ([M ∗ λ](x))dx (Dn )

Subject to λ ∈ RN +1 , where M ∗ : RN +1 → Lq is the adjoint map defined by M ∗λ =

X i

λi gi ∈ Lq ,

(3.1)

DUALITY AND INVARIANT MEASURES

7

and where φ∗ denotes the classical Fenchel (convex) conjugate of φ. Finally, weak duality holds: Z N +1 p for all λ ∈ R , for all f ∈ L such that M f = b, Q(λ) ≤ Φ(f ) = φ(f (x)) dx. (3.2) For readers not familiar with this type of argument, we refer to [2, 12], and provide a short, self-contained derivation of these facts in Appendix I. The function φ∗ is automatically convex (a fact we’ll need below), and the main work to do is in identifying conditions which guarantee that (Dn ) attains its maximum at a finite λ (dual attainment); this is accomplished by proving that Q is coercive. As often occurs, dual attainment leads to necessary and sufficient conditions for both the dual and primal problems (Section 3.3). 3.2. Dual attainment. A critical issue for solution of (Pn ) is whether or not the dual problem (Dn ) attains its maximum value. We remark at the outset that for many of our examples, the functional Q fails to be coercive. The treatment we give is motivated by [2], although immediate application of the results of that paper is impeded by the fact that our (Pn ) do not necessarily admit feasible solutions in the quasirelative interior of Lp (the interesting T –invariant measures may not be supported on all of X). The first problem that can occur is that the operator M ∗ can have non–trivial kernel; this problem was noted in the partition basis examples above, and is elaborated further in Section 4.1 below. Lemma 3.1. Suppose that (Pn ) is feasible and write9 RN +1 = Ker(M ∗ ) ⊕ Range(M ), the canonical orthogonal direct sum. Then 1. b = (1, 0, 0, . . . , 0) ∈ Range(M ). 2. Q(·) is upper semicontinuous and constant on hyperplanes parallel to the subspace Ker(M ∗ ). 3. If Ker(M ∗ ) 6= {0} then Q is not coercive, however in any event Maxλ∈RN +1 Q(λ) = Maxλ∈Range(M ) Q(λ). Moreover (Dn ) will attain its maximal value if and only if Q|Range(M ) attains its (relative) maximal value. ˆ∈Λ ˆ If f is feasible for (Pn ) and λ ˆ then 4. Write Range(M ) = span{b} ⊕ Λ. Z ˆ f = hλ, ˆ M f i = hλ, ˆ bi = 0 M ∗λ (where h·, ·i is the usual inner product on RN +1 ). ˆ where λ0 ∈ Ker(M ∗ ) and λ ˆ ∈ Λ, ˆ then 5. If λ = λ0 + αb + λ Z ˆ Q(λ) = α − φ∗ (α1 + M ∗ (λ)). Proof. (1) Since (Pn ) is feasible, there is an f ∈ Lp for which M f = b. Thus b ∈ Range(M ). Statements (2)–(3) follow immediately from (1) and the formula for Q in (Dn ). Statements (4)–(5) are direct computations. 9 Here,

Ker(M ∗ ) = {λ ∈ RN +1 | M ∗ λ = 0 µ–a.e.}.

8

C BOSE and R MURRAY

Given this lemma, dual attainment will follow once we have established that Q has bounded R upper level sets. This is done by exploiting the superlinear growth of the term φ∗ (M ∗ (·)) (restricted to the linear subspace Range(M )) to produce a bound on the decay of Q(λ) as λ → ∞. Lemma 3.2. With notation as in Lemma 3.1, and with k · k denoting the Euclidean norm in RN +1 , assume also 1. φ∗ ≥ 0 and φ∗ |[0,∞) non-decreasing. ˆ∈Λ ˆ 6= 0 one has [M ∗ λ] ˆ + 6= 0. ˆ with λ 2. For every λ ˆ and α + kλk ˆ γ0 ≥ 0 then Then there exist γ0 , δ0 > 0 such that if λ = λ0 + αb + λ ˆ γ0 ). Q(λ) ≤ α − δ0 φ∗ (α + kλk

R Proof. First of all, note that [M ∗ (·)]+ dµ is continuous on RN +1 , and by hypotheˆ ∈ Λ. ˆ Since the unit sphere in Λ ˆ is compact in sis (2), is positive for every non-zero λ RN +1 , there is a γ > 0 such that Z ˆ =1 ⇒ ˆ + dµ ≥ γ. kλk [M ∗ λ] X

Let 0 < γ0
γ . Then 0 ˆ

Z h X

ˆ

λ M ∗ kλk ˆ

Z

i

= +

h



ˆ

M ∗ kλλk ˆ



Z

i

+ + 1/p

≤ kM k [µ(Aλ )]

X−Aλ

h

ˆ

M ∗ kλλk ˆ

i +

+ [µ(X)] γ0

where kM ∗ k denotes the operator norm of M ∗ : RN +1 → Lq . We therefore conclude that  p γ − µ(X) γ0 def µ(Aλ ) ≥ = δ0 . (3.3) ∗ kM k ˆ ≥ α + kλk ˆ γ0 ≥ 0. Thus Next, restricted to Aλ , M ∗ (λ) = α 1 + M ∗ (λ) Z 1 ˆ γ0 . M ∗ λ ≥ α + kλk µ(Aλ ) Aλ Now, since φ∗ is convex, we have by Jensen’s inequality:   Z Z 1 ∗ ∗ µ(Aλ ) φ M λ dµ ≤ φ∗ (M ∗ λ) dµ. µ(Aλ ) Aλ Aλ Since φ∗ is non-decreasing, (3.3) and (3.4) lower bound the left-hand side by ˆ γ0 ), δ0 φ∗ (α + kλk and since φ∗ ≥ 0 we can upper bound the right-hand side to obtain Z ∗ ˆ δ0 φ (α + kλk γ0 ) ≤ φ∗ (M ∗ λ). X

The lemma now follows from Lemma 3.1 (5).

(3.4)

9

DUALITY AND INVARIANT MEASURES

Theorem 3.3. With notation as in Lemma 3.1, assume that 1. φ∗ ≥ 0 and φ∗ |[0,∞) non-decreasing; ∗ 2. lims→+∞ φ s(s) = ∞; ˆ∈Λ ˆ 6= 0 one has [M ∗ λ] ˆ + 6= 0. ˆ with λ 3. for every λ Then lim

kλk→∞, λ∈Range(M )

Q(λ) = −∞

and the dual optimization problem (Dn ) attains its supremum. Proof. It suffices to establish that for any sequence {λn } ⊂ Range(M ) with kλn k → ˆ n . If any subsequence ∞, Q(λn ) is unbounded below. First, note that λn = αn b + λ {λni } has αni → −∞ then Q(λni ) ≤ αni → −∞ by Lemma 3.1 (5) (recall that φ∗ ≥ 0). Thus, we need only consider sequences {λn } for which {αn } is bounded below. If ˆnk → ∞ {αn } is also bounded above, then since kλn k → ∞, we must have limn→∞ kλ ˆ ˆ so that αn + kλn k γ0 → ∞. In particular, αn + kλn k γ0 ≥ 0 for all large enough n, so by Lemma 3.2, ˆ n k γ0 ) → −∞ Q(λn ) ≤ αn − δ0 φ∗ (αn + kλ (since lims→∞ φ∗ (s) = ∞). The only other possibility is that {αn } is unbounded above, in which case there is a subsequence {λnj } of {λn } for which limj→∞ αnj = ∞. Then, in view of hypothesis 2, there is an N such that αnj ≥ 0

and

φ∗ (αnj ) 2 ≥ for j ≥ N . αnj δ0

Then use Lemma 3.2 to estimate Q(λnj ) ≤ αnj − δ0 φ∗ (αnj ) ≤ −αnj → −∞ as

j → ∞.



Note. The limit in condition (2) could be replaced by a lim sup since φ∗ is convex. Theorem 3.3 is analogous to [2, Theorem 4.8], but condition (3) is unnecessary there due to the assumption of a strictly positive feasible point for (Pn ). Example 3.4. Suppose X = [0, 1] and   2x T (x) = 2(x − 21 ) +   2(x − 12 )

1 2

if 0 ≤ x < 1/2 if 12 ≤ x < 34 if 34 ≤ x ≤ 1

Then f∗ = 21[ 21 ,1] is the unique invariant probability density for T . {[0, 12 ), [ 12 , 1]}. The functions gi are g0 = 1,

g1 = −1[ 14 , 12 ) ,

Let P =

g2 = 1[ 14 , 21 ) ,

ˆ = so Ker(M ∗ ) = span{(0, 1, 1)T }, Range(M ) = span{(1, 0, 0)T , (0, 1, −1)T } and Λ T ∗ T span{(0, 1, −1) }. But evidently, M (0, 1, −1) = −21[ 41 , 12 ) ≤ 0, so that hypothesis (3) of Theorem 3.3 fails. In fact, using the entropy functional H as the objective,

10

C BOSE and R MURRAY

one easily computes the dual functional Q on the two-dimensional subspace Range(M ) (where vectors take the form (α, β, −β)T ) as   1 −2β 3 Q((α, β, −β)T ) = α − e(α−1) e + . 4 4 So Q is non-coercive, supRange(M ) Q = log(4/3) but it is not reached at any point of Range(M ) so dual attainment fails. Polynomial basis functions with ‘Entropy’ objective. We can immediately apply Theorem 3.3 to establish coercivity of Q for Ding’s polynomial basis maximum entropy method [6]. The condition (D1) (above) implies that Ker(M ∗ ) = {0}, so the ˆ = {λ ∈ Rn+1 | λ0 = decomposition in Lemma 3.1 is Rn+1 = Range(M ) and the set Λ ⊥ T ∗ 0} = (span{b}) . Now suppose λ 6= 0 and λ b = 0 so that M (λ) 6= 0. If [M ∗ λ]+ = 0 then M ∗ λ = [M ∗ λ]− (almost everywhere) so whenever f > 0 is feasible for (Pn ), Z hλ, bi = hλ, M f i = M ∗ λf < 0. Since (D2) guarantees the existence of a feasible, almost everywhere positive invariant density f∗ , this calculation contradicts λT b = 0. We conclude that [M ∗ λ]+ 6= 0 and Theorem 3.3 yields dual attainment. Even without condition (D1), the restriction of (Dn ) to Range(M ) will yield dual attainment by the same argument. 3.3. Necessary and sufficient optimality conditions. Once dual attainment is established, the dual (Dn ) and primal (Pn ) problems are linked by a standard derivation of optimality conditions [2, 11]. We begin by quoting a calculus lemma. Lemma 3.5. Assume µ(X) < ∞ and that ϕ : R → R with ϕ ∈ C 1 . Suppose A : RN +1 → L∞ (X) is linear and set, for every z ∈ RN +1 , Z q(z) = ϕ((Az)(x)) dµ(x) X

Then

1. for every z0 ∈ R , ϕ0 (Az0 (·)) ∈ L1 ; N +1 2. for every z0 ∈ R , (∇z A)z0 (·) ∈ [L∞ ]N +1 ; 3. q is Gateaux Differentiable at every z0 ∈ RN +1 and in particular Z (∇z q(z0 ))i = ϕ0 (Az0 (x))Aei (x) dx ∈ R. N +1

Theorem 3.6 (Necessary and sufficient optimality conditions). Assume that the primal problem (Pn ) is feasible and ϕ = φ∗ is smooth and satisfies the hypothesis of Lemma 3.5. If λ(n) yields a global maximum of Q in the dual formulation (Dn ), then 1. λ(n) satisfies Z

[φ∗ ]0 ([M ∗ λ(n)](x))gi (x) dx = bi i = 0, 1, 2 . . . n,

(3.5)

∗ 1 2. fn = [φ∗ ]0 (M R λ(n)) ∈ L is feasible for (Pn ). 3. Q(λ(n)) = φ(fn (x)) dx and hence, from the weak duality condition (3.2), we conclude that fn is a minimizer in the primal problem (Pn ).

In particular, (3.5) is also sufficient for an optimal value of λ in (Dn ) and the function fn defined in part (2) is optimal in the primal problem.

DUALITY AND INVARIANT MEASURES

11

Proof. Using Lemma 3.5 we establish necessary conditions for λ(n) to maximize Q: Z 0 = −bi + [φ∗ ]0 ([M ∗ λ(n)](x))gi (x) dt, i = 0, 1, 2 . . . N, so fn defined in part (2) satisfies fn ∈ L1 and the constraint M fn = b. Now, since φ∗ is convex, proper and smooth one easily derives from classical facts (see Appendix I) that for all s ∈ R φ∗ (s) + φ∗∗ ([φ∗ ]0 (s)) = s[φ∗ ]0 (s) which, combined with φ∗∗ = φ yields φ([φ∗ ]0 (s)) + φ∗ (s) = s[φ∗ ]0 (s). If we now substitute s = [M ∗ λ(n)](x) and rearrange to obtain φ(fn (x)) = [M ∗ λ(n)](x)fn (x) − φ∗ ([M ∗ λ(n)](x)) we see that φ(fn (·)) is an integrable function since both functions on the right are integrable. Conclude that that fn is feasible for (Pn ). Finally, integrating this last expression over x ∈ X yields Φ(fn ) = Q(λ(n)) closing the duality gap and proving both that fn is a minimizer of Φ in (Pn ) and λ(n) is a maximizer of Q in (Dn ) if and only if (3.5) holds. 3.4. Domain restriction with a partition basis. In Example 3.4, the existence of a non-positive, non-zero M ∗ λ prevented Q from being coercive and destroyed any prospect of dual attainment. However, supp(M ∗ λ) was contained in a part of X which was transient under the action of T . In general, if f is feasible10 for (Pn ) and M ∗ λ ≤ 0 then supp(M ∗ λ) ∩ supp(f ) = ∅, so any non-coercivity of Q because of failure of condition (3) in Theorem 3.3 can be attributed to the behaviour of M ∗ on an unimportant part of X. Motivated by this (and justified in Lemma 3.7 and Section 4.2 below), we employ a (Pn )–specific domain restriction. ˆ∈Λ ˆ ≤ 0} and set X0 (n) = ˆ | M ∗λ Consider the (sub-)cone of RN +1 defined by C = {λ S ∗ˆ X − λ∈C {x ∈ X | [M λ](x) < 0}. ˆ Lemma 3.7. Let the constraints in (Pn ) be with respect to a partition basis. Then: 1. X0 (n) is a measurable subset of X. 2. Assume that φ satisfies the hypothesis of Theorem 3.3 and that Φ(f ) < ∞ =⇒ f ≥ 0 almost everywhere. Then Φ(f ) < ∞ and f feasible for (Pn ) implies supp(f ) ⊆ X0 (n). 3. Under the same condition as in part (2), X0 (n) is not a null-set of X and the value of the problem Z Minimize Φ0 (f ) = φ(f (x)) dx X0 (n) p

subject to f ∈ L (X0 (n)) and M f = b ∈ RN +1 10 For

example, any T –invariant density for which Φ(f ) < ∞.

(Pn0 )

12

C BOSE and R MURRAY

is identical to the value of (Pn ). Dual Attainment holds in the case of the problem (Pn0 ), R def Q0 (λ) = Maxλ∈RN +1 {hλ, bi − X0 (n) φ∗ (M ∗ (λ))}. Proof. (1) Observe that for each λ, {x|M ∗ λ(x) < 0} ∈ P ∨ T −1 P where P denotes the finite σ-algebra generated by the partition {Bi }. It follows that X0 (n) is measurable, even though the union is over an uncountable parameter set. (2) When Φ(f ) < ∞ R ˆ = 0 for all λ ˆ by Lemma 3.1 (4). This implies and f is feasible for (Pn ), M ∗ λf p that supp(f ) ⊆ X0 (n). (3) Since f0 ∈ L (X0 (n)) is feasible for (Pn0 ) if and only if f0 1X0 (n) ∈ Lp (X) is feasibleR for (Pn ), either both problems are infeasible, or there is a feasible f 6= 0. In this case, X0 (n) f dµ = 1, so X0 (n) 6= ∅. Furthermore Φ0 (f ) = Φ(f ) for all feasible f . Dual attainment holds since restricted to X0 (n), hypothesis (3) of Theorem 3.3 holds. In effect, we have moved troublesome vectors λ where M ∗ λ ≤ 0 into Ker(M ∗ ) over the restricted measure space X0 (n). Of course, the domain for (Pn ) is therefore changed, as is Φ, but our argument shows the values of the two problems are the identical, and the restricted problem has dual attainment. Example 3.4 revisited. Recall that dual attainment failed due to the non-coercivity of ˆ∈Λ ˆ ≤ 0 then we have supp(M ∗ λ) ˆ = [1, 1) ˆ and M ∗ λ Q. However, observe that if 0 6= λ 4 2 ˆ 6= 0 since λ ˆ ∈ Range(M ) = (Ker(M ∗ ))⊥ ). By Lemma 3.1 (4) (note that M ∗ λ R ∗ ˆ f = 0 and f ≥ 0 (provided f is feasible). Thus, supp(f ) ⊆ ([0, 1] − [1/4, 1/2)), M λ so we let X0 = ([0, 1] − [1/4, 1/2)) and solve Z η(f (x)) dx s.t. f ∈ L1 (X0 ; dx) and M f = b. Minimize H0 (f ) = X0

4. Applications. 4.1. The ‘Energy’ method with kernel. Recall that φ(t) = {1B1 , . . . , 1Bn } is a partition basis. Then, since Z 1 Q(λ) = hλ, bi − (M ∗ λ(x))2 dx 2

1 2 2t

and H =

and the conjugate in the second term is weakly (in fact norm) lower semicontinuous, Q is norm upper semicontinuous, so it attains its supremum over compact subsets. Let RN +1 = Ker(M ∗ ) ⊕ Range(M ), the canonical decomposition the the operator M . This decomposition is Pn Prelative n T −1 B − 1B nontrivial since g = 1 ∈ i T i = 1 − 1 = 0, so (0, 1, 1, . . . 1) i i=1 i=1 ∗ Ker(M ) (c.f. Lemma 3.1.) Clearly, if Q restricted to the subspace Range(M ) attains its (relative) maximum at λ∗ , then Q will also be maximized at λ∗ . To see why dual attainment holds in this case, note that   1R ∗ 2 Maxλ Q(λ) = Maxλ∈Range(M ) Q(λ) = Maxλ∈Range(M ) hλ, bi − (M λ) dx 2   1 ∗ = Maxλ∈Range(M ) hλ, bi − hλ, M M λi . 2

13

DUALITY AND INVARIANT MEASURES

The linear operator M M ∗ maps Range(M ) into Range(M ) and for λ 6= 0 in Range(M ) we have hλ, M M ∗ λi > 0 so the operator M M ∗ |Range(M ) is positive definite. It follows that the restricted functional Q|Range(M ) is a negative definite quadratic form, is therefore coercive and attains its maximum value. There is no need to identify the restricted measure space X0 (n) from this point of view, and applying Theorem 3.6 yields the necessary equation for the optimal value of λ(n) X R [λ(n)]j gi gj dx = bi , i = 0, 1, . . . n (4.1) j

and the formula for the optimal solution fn fn = M ∗ λ(n) =

X

[λ(n)]j gj .

(4.2)

j

Since [φ∗ ]0 (s) = s the equation to be solved is linear and consistent in n + 1 variables: A[λ(n)] = b R where A = {aij } is the (n + 1) × (n + 1) matrix of correlations: aij = gi (x)gj (x) dx. Notice that A = M M ∗ with Ker(A) = Ker(M M ∗ ) = Ker(M ∗ ) along which we know Q is constant, so any solution of (4.1) will lead to optimal values for both primal and dual (see also Theorem 3.6). In Section 4.3 we will present results of some numerical experiments concerning this problem with respect to the basis φi = 1Bi generated by a partition. Further details (including some issues about numerical implementation) are in [5]. 4.2. The ‘Entropy’ method and domain restriction. In the case of a partition basis, the X0 of Lemma 3.7 may be needed to ensure dual attainment. We show below how to identify X0 (n) by a finite computation. Once this is done, Theorem 3.6 can be invoked to derive the optimality equations in concrete form: Z exp{[M ∗ λ(n)](x) − 1}gi (x) dx = bi , i = 0, 1, . . . N (4.3) X0 (n)

from which the primal optimal points will be computed according to the formula in Theorem 3.6(2). That is, we recover the solution to (Pn ) by solving (Pn0 ) with λ(n) satisfying (4.3). The solution to (Pn ) is then f0 (x) = 1X0 (n) exp{[M ∗ λ(n)](x) − 1}. Identification of restricted domain. For the remainder of this section we assume that n is fixed, and {B1 , · · · , Bn } is a fixed partition of X. We thus have N = n, and suppress where possible the dependence on n. In particular, X0 = X0 (n). ˆ Then C = {λ ∈ Λ|M ˆ ∗ λ ≤ 0} and Recall the S decomposition Range(M ) = {b} ⊕ Λ. ∗ X0 = X − λ∈C {x|M λ(x) < 0}. Lemma 4.1. For each i, j = 1, . . . , n, M ∗ λ|Bi ∩T −1 Bj = (λ0 + λj − λi )1Bi ∩T −1 Bj .

14

C BOSE and R MURRAY

Proof. Since both {Bk }k=1n and {T −1 Bk }k=1n are partitions of X, the lemma follows P n directly from the facts that M ∗ λ = k=0 λk gk and 1Bi ∩T −1 Bj = 1Bi 1T −1 Bj . Lemma 4.2. Let A be the (n × n) matrix with entries Aij = µ(Bi ∩ T −1 Bj ) and let ˆ be such that M ∗ λ ≤ 0. If (Am1 )ij > 0 and (Am2 )ji > 0 for some m1 , m2 > 0 λ∈Λ then λi = λj . ˆ λ0 = 0. Since each Akl ≥ 0, there is a sequence {ik }m1 +m2 such Proof. Since λ ∈ Λ, k=0 that i0 = i = im1 +m2 , im1 = j and each Ail il+1 > 0. Then, by Lemma 4.1, Z Z (λil+1 − λil ) Ail il+1 = (λil+1 − λil )1Bil ∩T −1 Bil+1 = M ∗ λ ≤ 0. Bil ∩T −1 Bil+1

Thus λi0 ≥ λi1 ≥ · · · ≥ λim1 ≥ · · · ≥ λim1 +m2 = λi0 . In particular, λi = λi0 = λim1 = λj . Proposition 4.3. The following are equivalent: (i) Aij > 0 and (Am )ji > 0 for some m > 0; (ii) µ(Bi ∩ T −1 Bj ) > 0 and Bi ∩ T −1 Bj ⊂ X0 (mod µ). Proof. (i) ⇒ (ii) Suppose that µ(Bi ∩ T −1 Bj ) = Aij > 0, (Am )ji > 0 and let λ ∈ C. Then, using Lemma 4.2 with m1 = 1 and m2 = m gives λi = λj . By Lemma 4.1, M ∗ λ(x) = λj −λi = 0 when x ∈ Bi ∩ T −1 Bj . This establishes that Bi ∩T −1 Bj ⊂ X0 . (ii) ⇒ (i) We assume µ(Bi ∩ T −1 Bj ) = Aij > 0, but that (Am )ji = 0 for all m > 0. ˆ ∈ C such that M ∗ λ| ˆ B ∩T −1 B < 0, since this will show that We need to construct a λ i j −1 Bi ∩ T Bj is disjoint from X0 µ–a.e. Let I = {j} ∪ {k : (Am )jk > 0 for some m > 0} and define λ by putting λl = −1I (l) and λ0 = 0. Observe that (a) λi = 0 and λj = −1; and (b) if k ∈ I and Akl > 0 then l ∈ I. Now, by Lemma 4.1, if Akl > 0 then M ∗ λ|Bk ∩T −1 Bl = λl − λk . By observation (a), M ∗ λ|Bi ∩T −1 Bj = −1. We now check that M ∗ λ ≤ 0: by observation (b), if λk = −1 and Akl > 0 then λl = −1 so M ∗ λ|Bk ∩T −1 Bl = 0; on the other hand, if λk = 0 then λl − λk ≤ 0, so in any event ˆ + z where λ ˆ ∈ Λ ˆ and z ∈ Ker(M ∗ ). Then M ∗ λ ≤ 0. Finally, decompose λ = λ ˆ = M ∗ λ ≤ 0 and M ∗ λ ˆ < 0. M ∗λ −1 Bi ∩T

Bj

Proposition 4.3 suggests an elementary iterative procedure for identifying X0 up to a set of measure 0: 1. Calculate the n × n matrix Aij = µ(Bi ∩ T −1 Bj ). 2. For each Aij > 0, determine I(j) = {k|(Am )jk > 0 for some m > 0}. If i ∈ I(j) then Bi ∩ T −1 Bj ⊂ X0 and set Aˆij := Aij . Otherwise, set Aˆij := 0. At the end of this procedure, set I0 = {(i, j)|Aˆij > 0}. Then take X0 = ∪(i,j)∈I0 Bi ∩ T −1 Bj . Remarks 4.4. 1. For reasonably regular maps T the matrix A is very sparse, with O(n) nonzero entries which can be stored as a list of triples (i, j, Aij ). Consequently each set I(j) can be determined in O(n) operations (mostly array look-ups); the identification of I0 via the above procedure thus requires at most O(n2 ) operations.

DUALITY AND INVARIANT MEASURES

15

2. Proposition 4.3 essentially characterizes X0 as elements of the partition P ∨ T −1 P which correspond to strongly connected components of a certain directed graph11 . If A has O(n) non-zero entries, all of these components (and the edges connecting them) can be found with O(n) computational effort by Tarjan’s algorithm [14]. See [8] for related work on the use of discrete models to obtain recurrent components and Lyapunov functions of dynamical systems. The following corollary to Proposition 4.3 will be used below. Corollary 4.5. If (Aˆm )ik > 0 then there is an M > 0 such that (AˆM )ki > 0. Proof. There are indices i = i0 , i1 , . . . , im = k and integers M1 , . . . , Mm such that Aˆil−1 il > 0 and (AˆMl )il il−1 > 0 for l = 1, . . . , m. Then (AˆM1 +···+Mm )ki ≥ (AˆMm )im im−1 · · · (AˆM1 )i1 i0 > 0. Solution of the necessary conditions. Since the solution to (Pn ) is obtained via (Dn ), one needs to maximize Z Q(λ) = hλ, bi − exp{M ∗ λ(x) − 1} dx. X0

Using Lemma 4.1 and Proposition 4.3, we have X Q(λ) = λ0 − exp{λ0 − 1}

Aˆij exp{λj − λi },

(i,j)∈I0

P so that (Dn ) is solved by minimizing G(λ) = (i,j)∈I0 Aˆij exp{λj − λi } and setting  P ˆ λ0 = 1 − log (k,l)∈I0 Akl exp{λl − λk } . The optimal values of λ can then be used to recover the solution to (Pn ) as in Theorem 3.6(2). The minimum of G(λ) can be calculated using standard optimization algorithms, although we obtained rapid convergence with a fixed point method that we now describe. P P ∂G The equations ∂λ = 0 reduce to l Aˆil exp{λl − λi } = k Aˆki exp{λi − λk }. Thus, i for i = 1, . . . , n, P ˆ −λk k6=i Aki e −λi 2 (e ) = P Aˆil eλl l6=i

  (m) (m) which suggests an iterative scheme (λi )(m+1) = − 21 log Fi e−λ1 , . . . , e−λn with the choice Fi (x1 , · · · , xn ) = (m)

directly with the values xi

P

k6=i

P

l6=i

ˆki xk A . ˆ Ail 1

In practice, it is more convenient to work

xl (m)

= e−(λi ) (m+1)

xi

, updating according to p Fi (x(m) ) =P p . Fj (x(m) ) j

We have no general proof for convergence of this iteration, but note that it worked in all cases we tested, using (xi )(0) = 1. Remarks 4.6 11 The

vertices are the elements of P and the edge set corresponds to those ij with Aij > 0.

16

C BOSE and R MURRAY

1. The definition of Fi needs slight modification to allow for the possibilities P that (i) l6=i Aˆil = 0 or (ii) xl = 0. In case of (i), Corollary 4.5 ensures that P ˆ k6=i Aki = 0, from which it follows that G(λ) is independent of λi . In this case, set Fi (x) := 1. In case of (ii), an indeterminate expression is obtained only for those i with Aˆil > 0, and continuity of Fi can then be assured by putting Fi (x) = 0. 2. The normalization of x(m+1) ensures that the iteration scheme preserves the unit simplex in (R+ )n , without altering the value of G(λ) (if x 7→ c x the effect on xi = e−λi is λi 7→ λi − log c and for any log c ∈ R, G(λ) = G(λ − log c)). 3. The iteration is not a uniform contraction of the unit simplex since it preserves the boundary. 4.3. Numerical examples. We now apply the energy and entropy minimization approaches to approximate the invariant measures for several examples. Example 1. Let   x ∈ [0, 1/2), 2 x T (x) = 2x − 1/2 x ∈ [1/2, 3/4),   2x − 1 x ∈ [3/4, 1]. The invariant measure for T has density f∗ (x) = 2 1[1/2,1] . For a sequence of values R (H) (V ) of n, approximations fn , fn have been calculated which minimize V (f ) = 21 f 2 R (V ) and H(f ) = f log f respectively. The approximation errors kf − fn kL1 and (H) kf − fn kL1 are displayed in Table 1. The density approximations for n = 729 (V ) are displayed in the first row of Figure 1. Notice that f729 has some negative values in [0, 1/2); this is possible because our formulation of the optimization problem (with V ) imposes no positivity condition although [fn ]− → 0 and [fn ]+ → f (see [5, (H) Remark 5.3(2)]). The spikes in f729 occur at preimages of 12 (a boundary point of supp(f∗ )), and disappear when 12 is a boundary of a Bi . Example 2. Let  3x    x + 1/2 T (x) =  x − 1/2    3x − 2

x ∈ [0, 1/4), x ∈ [1/4, 1/2), x ∈ [1/2, 3/4), x ∈ [3/4, 1].

The invariant measure for T has density f∗ (x) = 1.2 1[0,1/4)∪(3/4,1] + 0.81[1/4,3/4] . (V ) (H) For a sequence of values of n, approximations fn , fn have been calculated which R R minimize V (f ) = 21 f 2 and H(f ) = f log f respectively. The approximation er(V ) (H) rors kf − fn kL1 and kf − fn kL1 are displayed in Table 1. The density approximations for n = 729 are displayed in the second row of Figure 1. Example 3. The tent map Tr (x) = r(0.5 − |x − 0.5|) admits an invariant density fr (of bounded variation) whenever r ∈ (1, 2]. Therefore, H(fr ) < ∞, and a sequence of fn solving the finitely constrained optimization problems (Pn ) will converge in L1 to fr −(k+1) −k as n → ∞. In fact, if r ∈ (22 , 22 ) then the density is supported on a union of 2k intervals. In Figure 2, the n = 729 minimum entropy approximation is displayed

DUALITY AND INVARIANT MEASURES

n 3 9 27 81 243 729 2187 6561

Example 1 (V ) (H) kf − fn kL1 kf − fn kL1 0.666667 0.699359 0.346007 0.326124 0.196225 0.134116 0.090671 0.051596 0.038662 0.019901 0.021628 0.006858 0.008310 0.002605 0.003775 0.000863

17

Example 2 (V ) (H) kf − fn kL1 kf − fn kL1 0.098485 0.104804 0.063600 0.067046 0.042583 0.043930 0.035391 0.036633 0.027982 0.029287 0.024727 0.025740 0.022186 0.023187 0.020258 0.021252

Table 1 L1 approximation errors for energy and entropy minimization approaches to invariant density calculations.

Fig. 1. n = 729 density approximations for the maps in Examples 1 and 2 using Energy and Entropy minimization.

for the tent map with r = 1.3. The displayed density is supported on X0 (n), a union of several intervals; the larger two contain the support of the invariant density for Tr , the remaining (small) intervals are clustered near the unstable fixed points for Tr r ≈ 0.565. The correct density has no simple formula at this parameter at x = 0, 1+r value, so a direct calculation of the approximation error is not possible. Example 4. The logistic map Tr (x) = r x (1 − x) admits an invariant density f∗ (x) = √ 1 when r = 4.0. Then, H(f∗ ) = 0.241564 · · · < ∞, so the minimum entropy π

x (1−x)

method will produce a sequence of density approximations fn such that limn→∞ kfn −

18

C BOSE and R MURRAY

Fig. 2. Entropy minimization with n = 729 for density approximation for the tent map (Example 3)

Fig. 3. Entropy minimization with n = 729 for density approximation for the fully developed logistic map (Example 4)

f kL1 = 0, even though neither Tr , nor any of its iterates, is expanding. The n = 729 minimum entropy approximation is displayed in Figure 3. (The error kfn − f∗ kL1 = 0.24683). Appendix I. Derivation of (Dn ). The Lagrangian for (Pn ) is L(f, λ) = Φ(f ) − hλ, M f − bi, f ∈ Lp , λ ∈ RN +1 where h·, ·i denotes inner product in RN +1 . Next, define Q(λ) = inf p L(f, λ) f ∈L

= hλ, bi − sup {hλ, M f i − Φ(f )} f ∈Lp

= hλ, bi − Φ∗ (M ∗ λ) where Φ∗ : Lq → R denotes the Fenchel (convex) conjugate of Φ, that is, Z



Φ (g) = sup f ∈Lp

 f (x)g(x) dx − Φ(f )

DUALITY AND INVARIANT MEASURES

19

and the adjoint M ∗ : RN +1 → Lq is calculated as M ∗λ =

n X

λk gk .

k=0

We note that Φ∗ is easily seen to be both convex and weakly lower semicontinuous on Lq . The functional Φ∗ is the Banach space generalization of the classical convex conjugate φ∗ for real functions φ : R → (−∞, ∞]. See Rockafellar [12] for definitions and elementary properties. When Φ is of integral type, there are some important connections between the two concepts. For example, if f ∈ Lp is such that φ(f (·)) is integrable, then for all g ∈ Lq , from Fenchel’s inequality φ(t) + φR∗ (s) ≥ ts, after substituting t = Rf (x) and s = g(x) and integrating, one obtains φ∗ (g(x)) dx ≥ R R ∗ f (x)g(x)Rdx − φ(f (x)) dx. It follows that φ (g(x)) dx ∈ (−∞, ∞] unambiguously and φ∗ (g(x)) dx ≥ Φ∗ (g). These and many other properties of integral-type objectives are derived in Rockafellar [11]. We summarize the facts that we will use. Lemma I.1. Let φ : R → (−∞, ∞] be a convex, lower-semicontinuous and proper function. R 1. Suppose that for every f ∈ Lp , Φ(f ) = φ(f R (x)) dx is unambiguously an element of (−∞, ∞]. Then for each g ∈ Lq , φ∗ (g(x)) dx is unambiguously defined as an element of (−∞, ∞] and Z ∗ Φ (g) = φ∗ (g(x)) dx R 2. If φ∗ (g) is integrable for at least one g ∈ Lq then the integral φ(f (x)) dx is well-defined (possibly = ∞) for every f ∈ Lp . This will be the case, for example, if φ∗ is proper and µ(X) < ∞. Equipped with these tools, we can write down the Dual Optimization problem associated to (Pn ) as Z Max Q(λ) = hλ, bi − φ∗ (M ∗ λ)(x) dx X (Dn ) N +1 subject to λ ∈ R , an unconstrained, finite-dimensional and concave problem. It follows directly from the definitions of Q and L that Q(λ) ≤

inf

f ∈Lp , M f =b

L(f, λ) ≤ Φ(f )

(I.1)

whenever f is feasible for (Pn ) and λ ∈ RN +1 . Hence, the (maximal) value of (Dn ) is majorized by the (minimal) value of (Pn ), the so-called principle of weak duality. Thus, solving the unconstrained dual problem (Dn ) is equivalent to solving the primal (Pn ) precisely when this ‘duality gap’ can be closed. Theorem 3.6 describes one situation which is tailored to our applications and where the duality gap can be closed. REFERENCES

20

C BOSE and R MURRAY

[1] J. Borwein and A. Lewis, Convergence of best entropy estimates, SIAM J. Optim., 1 (1991), pp. 191–205. , Duality relationships for entropy-like minimization problems, SIAM J. Control. Optim., [2] 26 (1991), pp. 325–338. [3] , On the convergence of moment problems, Trans. AMS, 325 (1991), pp. 249–271. [4] C. Bose and R. Murray, Dynamical conditions for convergence of a maximum entropy method for Frobenius–Perron operator equations, Appl. Math. Comp. To appear. , Minimum ‘Energy’ approximations of invariant measures for non-singular transforma[5] tions, Discrete Contin. Dyn. Syst., 14 (2006), pp. 597–615. [6] J. Ding, A maximum entropy method for solving Frobenius–Perron operator equations, Appl. Math. Comp., 93 (1998), pp. 155–168. [7] J. Ding and A. Zhou, Finite approximations of Frobenius–Perron operators. A solution of Ulam’s conjecture to multi–dimensional transformations, Phys. D, 92 (1996), pp. 61–68. [8] W. D. Kalies, K. Mischaikow, and R. C. A. M. VanderVorst, An algorithmic approach to chain recurrence. Preprint, 2005. [9] T.-Y. Li, Finite approximation for the Perron–Frobenius operator. a solution to Ulam’s conjecture, J. Approx. Theory, 17 (1976), pp. 177–186. [10] R. Murray, Approximation error for invariant density calculations, Discrete Contin. Dyn. Syst., 4 (1998), pp. 535–558. [11] R. T. Rockafellar, Integrals which are convex functionals, Pacific J. Math., 24 (1968), pp. 525–539. , Convex analysis, Princeton University Press, 1970. [12] [13] H. H. Schaefer, Topological vector spaces, vol. 3 of Graduate Texts in Mathematics, Springer Verlag, 1970. [14] R. Tarjan, Depth-first search and linear graph algorithms, SIAM J. Comput., 1 (1972), pp. 146–160. [15] S. Ulam, A collection of mathematical problems, Interscience Publishers, 1960.