Primal-dual methods for solving infinite ... - Optimization Online

3 downloads 0 Views 308KB Size Report
In this paper we show that the infinite-dimensional differential games with simple ... infinite-dimensional space of variables (control) to a finite-dimensiomal ...
Primal-dual methods for solving infinite-dimensional games P. Dvurechensky, Yu. Nesterov, and V. Spokoiny∗ June 2014

Abstract In this paper we show that the infinite-dimensional differential games with simple objective functional can be solved in a finite-dimensional dual form in the space of dual multipliers for the constraints related to the end points of the trajectories. The primal solutions can be easily reconstructed by the appropriate dual subgradient schemes. The suggested schemes are justified by the worst-case complexity analysis.

1

Introduction

In the last years we can observe an increasing interest to the primal-dual subgradient methods. This line of research, started in [4], leads to the special methods, which allow to reconstruct approximate solution to a conjugate problem. In order to do this, methods need to get an access to the internal variables of the oracle. Therefore, all these methods are problem-specific. This approach is very interesting when the primal and conjugate problems have different level of complexity. For example, we can have a primal minimization problem of very high dimension, with very simple objective function and basic feasible set, and a small number of linear equations. Introducing Lagrange multipliers for these linear constraints, we can pass to the conjugate (dual) problem1 , which has good chances to be simple in view of its small dimensional. The only delicate problem is the reconstruction of the primal variables from the minimization process, which we run in the conjugate space. In [2] this approach was applied to the problems of Optimal Control with convex constraints for the end point of the trajectory. These constraints were treated by linear operators from infinite-dimensional space of variables (control) to a finite-dimensiomal space of phase variables. It was shown, that an appropriate optimization process in the latter space can generate also ∗

The research presented in this paper was partially supported by the Laboratory of Structural Methods of Data Analysis in Predictive Modeling, MIPT, through the RF government grant, ag.11.G34.31.0073 and by RFBR, research projects No. 13-01-12007 ofi m and 14-01-00722-a 1 Since the objective and the feasible set of our problem are simple, very often this can be done in an explicit form.

1

nearly optimal sequence of controls (functions of time). Moreover, this technique was supported by the worst-case complexity analysis. In this paper we do the next step in this direction. We consider an infinite-dimensional saddle-point problem, which variables (controls) must satisfy sone linear equality constraints. We show that these constraints can be dualized by finite-dimensional multipliers. Moreover, it appears that the dual counterpart of our problem is again a saddle-point problem, but in a finite dimension (we call this problem conjugate). We show how to reconstruct the infinitedimensional primal strategies from a special finite-dimensional scheme, which solves the conjugate problem. The paper is organized as follows. In Section 2 we show how to form the conjugate problem for initial finite-dimensional convex-concave saddle point problem with equality constraints. It seems that this transformation is new even in this simplest situation. Therefore we devote to it a separate section. In Section 3 we consider the basic formulation of Differential Games with convex-concave objective and with trajectories of the players governed by the systems of linear differential equations. We treat the end points of the trajectories as linear operators from infinite to finite-dimensional space. For the future applications, we derive some bounds for their norms. In Section 4 we write down an equivalent conjugate saddle point problem in the finite-dimensional space of dual multipliers and derive some bounds on the size of the optimal conjugate solutions. In the end of this section, we present a numerical schemes and derive the upper bounds on the quality of primal and conjugate solutions. In the last Section 5 we consider differential game with objective function satisfying strong convexity assumption. For this case, we obtain better complexity bounds.

2

Duality for saddle-point problems

In this section, our goal is to solve the following saddle-point problem:     min max {Φ(x, y) : Gy = g} : Hx = h = max min {Φ(x, y) : Hx = h} : Gy = g , x∈X

y∈Y

y∈Y

x∈X

(2.1)

where G ∈ Rk×m , g ∈ Rk , H ∈ Rl×n , h ∈ Rl , X, Y are closed convex bounded sets, and function Φ(·, y) is convex for any fixed y, function Φ(x, ·) is concave for any fixed x. In this problem, we can pass to a dual formulation by introducing new variables as Lagrange multipliers for the constraints. Lemma 2.1. Problem (2.1) is equivalent to the the following one:   min max min max [Φ(x, y) − hµ, Hxi + hλ, Gyi] + hµ, hi − hλ, gi , λ

µ

x∈X y∈Y

(2.2)

which we call conjugate problem to (2.1). Here λ ∈ Rk , µ ∈ Rl . Proof. Let us consider the inner problem in the right-hand side of (2.1). For each y ∈ Y it is a problem of minimization of convex function over convex set with linear constraint. Hence 2

it is equivalent to def

f (y) = min max {Φ(x, y) + hµ, h − Hxi} µ

x∈X

(2.3)

Due to convexity in x, concavity in µ, and the fact that X is convex and compact, we can swap min and max: f (y) = max min {Φ(x, y) + hµ, h − Hxi} µ

x∈X

Note that f (y) in (2.3) is a concave function of y. So the outer problem in the right hand side of (2.1) is a problem of maximization of concave function over convex set with linear constraint. Hence it is equivalent to max min {f (y) + hλ, Gy − gi}. Using the same reasoning as above, we y∈Y

λ

conclude that max min {f (y) + hλ, Gy − gi} = min max {f (y) + hλ, Gy − gi}. Hence we have y∈Y

λ

λ

y∈Y

(2.1) = min max max min {Φ(x, y) + hµ, h − Hxi + hλ, Gy − gi} λ

µ

y∈Y

x∈X

Swapping two operations of maximization we get (2.1) = min max max min {Φ(x, y) + hµ, h − Hxi + hλ, Gy − gi} λ

µ

y∈Y x∈X

Since Φ(x, y) + hµ, h − Hxi + hλ, Gy − gi is convex in x and concave in y and since X and Y are convex compact sets, we can swap max and min and obtain (2.1)=(2.2). y∈Y

x∈X

Note that the saddle function in the inner problem in (2.2) def

ψ(λ, µ) = min max [Φ(x, y) − hµ, Hxi + hλ, Gyi] x∈X y∈Y

(2.4)

is well defined for every λ, µ since its objective function is convex-concave and the sets X, Y are convex and compact. Let us study the properties of function ψ(λ, µ). Lemma 2.2. Let (x∗ , y ∗ ) be a saddle-point of problem (2.4) with some λ ∈ Rk and µ ∈ Rl being fixed. Then function ψ(·, µ) is convex for any fixed µ ∈ Rl , and its subgradient ∇λ ψ(λ, µ) = Gy ∗ is a bounded function of (λ, µ). Similarly, function ψ(λ, ·) is concave for any fixed λ ∈ Rk , and its supergradient ∇µ ψ(λ, µ) = −Hx∗ is a bounded function of (λ, µ). Proof. Since the saddle point in problem (2.4) do exist for any λ, µ, we have h i ˜ λ) − hµ, Hxi , ψ(λ, µ) = min ψ(x, x∈X h i ˆ ψ(λ, µ) = max ψ(y, µ) + hλ, Gyi , y∈Y

(2.5) (2.6)

where ˜ λ) = max [Φ(x, y) + hλ, Gyi] , ψ(x,

(2.7)

ˆ µ) = min [Φ(x, y) − hµ, Hxi] . ψ(y,

(2.8)

y∈Y

x∈X

3

In the first case, since we take the minimum of linear functions in µ, the result is concave in µ. So ψ(λ, µ) is concave in µ for any fixed λ. Similarly, from the second case we get that ψ(λ, µ) is convex in λ for any fixed µ. Let us fix some λ and µ0 . Denote by (x∗0 , y0∗ ) the saddle point of problem (2.4) for these values, and by (x∗ , y ∗ ) the saddle-point of this problem with λ and arbitrary µ. Then h i h i ∗ ∗ ˜ ˜ ˜ ˜ ∗ , λ)−hµ0 , Hx∗ i min ψ(x, λ) − hµ, Hxi = ψ(x , λ)−hµ, Hx i, min ψ(x, λ) − hµ0 , Hxi = ψ(x 0 0 x∈X

x∈X

We have the following: ψ(λ, µ) − ψ(λ, µ0 ) + hHx∗0 , µ − µ0 i = ˜ ∗ , λ) − hHx∗ , µi − ψ(x ˜ ∗ , λ) + hHx∗ , µ0 i + hHx∗ , µ − µ0 i = ψ(x 0

˜ , λ) − hHx , µi − ψ(x ˜ λ) − hHx, µi} − min{ψ(x, ∗

0

0

˜ ∗ , λ) (ψ(x 0



x∈X

− hHx∗0 , µi) = ˜ ∗ , λ) − hHx∗ , µi) (ψ(x 0 0

≤ 0.

So, by definition, vector −Hx∗ is a supergradient of ψ(λ, µ) with respect to µ. In the same way we prove that Gy ∗ is subgradient of ψ(λ, µ) with respect to λ. The sets X and Y are bounded, the supergradients and subgradients are bounded too. Thus, we started from the saddle-point problem (2.1) and got the equivalent problem (2.2), which also has saddle-point structure and its objective function is also convex-concave. Moreover, we know for this function the partial sub- and supergradients. Hence if we can solve the problem (2.4) efficiently (or in a closed form) for every λ, µ, then we can apply a standard method (e.g. [4]) for solving the conjugate problem. This approach can be efficient if the sets X and Y are simple and the dimensions of linear constraints are smaller than the dimensions of x and y. Below we will apply this dualisation approach for more difficult infinite-dimensional problems from differential games theory.

3

Differential games

Consider two moving objects with dynamics given by the following equations: x(t) ˙ = Ax (t)x(t) + B(t)u(t),

y(t) ˙ = Ay (t)y(t) + C(t)v(t),

(x(0), y(0)) = (x0 , y0 ).

(3.1)

Here x(t) ∈ Rn , y(t) ∈ Rm are the phase vectors of these objects, u(t) is the control of the first object (pursuer), and v(t) is the control of the second object (evader). Matrices Ax (t), Ay (t), B(t), andC(t) are continuous and have appropriate sizes. The system is considered on the time interval [0, θ]. Controls are restricted in the following way u(t) ∈ P ⊂ Rp , v(t) ∈ Q ⊂ Rq ∀t ∈ [0, θ]. We assume that P, Q are closed convex sets. The goal of pursuer is to minimize the value of the functional: Z θ def F (u, v) + Φ(x(θ), y(θ)) = F˜ (τ, u(τ ), v(τ ))dτ + Φ(x(θ), y(θ)). (3.2) 0

4

The goal of the evader is the opposite. We need to find an optimal guaranteed result for each object, which leads to the problem of finding the saddle-point of the above functional. We assume the following: • u(·) ∈ L2 ([0, θ], Rp ), and v(·) ∈ L2 ([0, θ], Rq ), • the saddle-point in this class of strategies exists, • function F (u, v) is upper semi-continuous in v and lower semi-continuous in u, • Φ(x, y) is continuous. Denote by Vx (t, τ ) the transition matrix of the first system in (3.1). It is the unique solution of the following matix Cauchy problem dVx (t, τ ) = Ax (t)Vx (t, τ ), t ≥ τ, Vx (τ, τ ) = E. dt Here E is identity matrix. If the matrix Ax (t) is constant, then Vx (t, τ ) = e(t−τ )A . If we solve the first differential equation in (3.1), then we can express x(θ) as a result of application of the linear operator B : L2 ([0, θ], Rp ) → Rn : Z θ def x(θ) = Vx (θ, 0)x0 + Vx (θ, τ )B(τ )u(τ )dτ = x˜0 + Bu (3.3) 0

Below, we will use an conjugate operator B ∗ for operator B. Let us it explicitly. Let µ be n-dimensional vector. Z θ Z θ hµ, Vx (θ, τ )B(τ )u(τ )idτ = Vx (θ, τ )B(τ )u(τ )dτ i = hµ, Bui = hµ, 0 0 Z θ = hB T (τ )VxT (θ, τ )µ, u(τ )idτ = hB ∗ µ, ui 0

Note that vector ζ(t) = VxT (θ, t)µ is the solution of the following Cauchy problem: ˙ = −AT (t)ζ(t), ζ(θ) = µ, t ∈ [0, θ]. ζ(t) x

So we can solve this ODE and find B ∗ µ using the found solution ζ(t) as B ∗ µ(t) = B T (t)ζ(t). In the same way we introduce the transition matrix Vy (t, τ ) of the second system in (3.1), Rθ the operator C : L2 ([0, θ], Rq ) → Rm defined by the formula Cv = 0 Vy (θ, τ )C(τ )v(τ )dτ , and the vector y˜0 = Vy (θ, 0)y0 . Adjoint operator C ∗ also can be computed using the solution of some ODE. So our main problem of interest has the following form:   min max {F (u, v) + Φ(x, y) : y = y˜0 + Cv} : x = x˜0 + Bu (3.4) u∈P

v∈Q

The goal of this paper is to introduce a computational method for finding its approximate solution. Remark way, we can treat the problems with objective functional of the R θ 3.1. In the same P ˜ form 0 F (τ, u(τ ), v(τ )dτ + ki=0 Φ(x(ti ), y(ti )), or constraints of the type Bu ∈ T and Cv ∈ S, where T and S are closed convex sets. 5

3.1

Estimating the norms of operators B and C

Let us consider the problem of estimating the norms of operators B, C. This is an important problem since we need these operators to be bounded, and their norms play a significant role in the estimates for the rate of convergence of the methods below. We will work with the operator B (3.3), since the norm of C can be estimated in a similar way. The following argument was used in [2]; it is presented here for the reader convenience. By definition we have kBk =

{kBuk :

sup

kukL2 ([0,θ],Rp ) = 1}.

(3.5)

u∈L2 ([0,θ],Rp )

As it was shown above the conjugate operator B ∗ transforms µ ∈ Rn into the function B T (τ )VxT (θ, τ )µ ∈ L2 ([0, θ], Rp ). Let us define a matrix θ

Z

Vx (θ, τ )B(τ )B T (τ )VxT (θ, τ )dτ = BB ∗ ,

R=

(3.6)

0

which is symmetric and positive semidefinite. Definition 3.1. The system with dynamics given by the first differential equation in (3.1) and initial value x(0) = 0 is called reachable on [0, θ] if for any xˆ ∈ Rn there exists a control such that x(θ) = xˆ. The teachability is closely related to the properties of matrix R (see Corollary 2.3 in [1]). Theorem 3.1. The system with dynamics given by the first differential equation in (3.1) and initial value x(0) = 0 is reachable on [0; θ] if and only if R is positive definite. We also need the following Lemma 3.1. Let H be a Hilbert space and the linear operator A : H → RL be nondegenerate: AA∗  0. Then for any b ∈ RL and f ∈ H, the Euclidean projection πb (f ) of f onto the subspace Lb = {g ∈ H : Ag = b} is defined as πb (f ) = f + A∗ (AA∗ )−1 (b − Af ). From the equation (3.5), −1

 kBk =

inf

{kukL2 ([0,θ],Rp ) :

u∈L2 ([0,θ],Rp )

kBuk = 1}

Using the teachability property, we have ImB(L2 ([0, θ], Rp )) = Rn and inf

{kukL2 ([0,θ],Rp ) :

u∈L2 ([0,θ],Rp )

kBuk = 1} =

inf

{kukL2 ([0,θ],Rp ) :

u∈L2 ([0,θ],Rp ),x∈Rn ,kxk=1

Bu = x}

From Theorem 3.1, we have that inf

u∈L2 ([0,θ],Rp )

{kukL2 ([0,θ],Rp ) :

Bu = x} = kB ∗ (BB ∗ )−1 xk = h(BB ∗ )−1 x, xi1/2 . 6

Hence inf

u∈L2 ([0,θ],Rp ),

{kukL2 ([0,θ],Rp ) :

−1/2

1/2

kBuk = 1} = inf h(BB ∗ )−1 x, xi1/2 = λmin ((BB ∗ )−1 ). kxk=1

1/2

Finally kBk = λmin ((BB ∗ )−1 ) = λmax (BB ∗ ), where BB ∗ = R. Also we can get a time-independent estimate of kBk in the case when x = 0 is an exponentially stable equilibrium of the system with dynamics t ≥ 0,

x(t) ˙ = Ax (t)x(t), where Ax (t) is continuous with time. Recall the following well known result.

Theorem 3.2. [1] Assume that the matrix Ax (t) is time-independent, and that there exists a matrix M = M T  0 such that ATx M + M Ax ≺ 0. Then the equilibrium x = 0 is globally exponentially stable. So we can consider a case when there exist ν > 0 and M = M T  0 such that ATx M +M Ax  −νM . Let us also assume that the matrix B(t) is time-independent. Then we have that Bu is the position at moment θ of the point of the unique trajectory defined by the linear system x(t) ˙ = Ax x(t) + Bu(t), Hence kx(θ)k2 = hx(θ), x(θ)i ≤ and

d hM x(t), x(t)i dt

x(0) = 0.

hM x(θ), x(θ)i , λmin (M )

= 2hM x(t), x(t)i ˙ = 2hM x(t), Ax(t) + Bu(t)i

h(AT M + M A)x(t), x(t)i ˙ + 2hM x(t), Bu(t)i ≤ −νhM x(t), x(t)i + 2hM x(t), Bu(t)i ≤ ν1 hM Bu(t), Bu(t)i. Since x(0) = 0, we get Z hM x(θ), x(θ)i = 0

θ

d 1 hM x(t), x(t)idt ≤ dt ν

Z 0

θ

1 hM Bu(t), Bu(t)idt ≤ λmax (M ) ν

Z

1 ≤ λmax (M )kBk2 kuk2 . ν λmax (M ) λmax (M ) kBk2 kuk2 , and therefore kBk2 ≤ νλ kBk2 . Finally we have kBuk2 ≤ νλ min (M ) min (M ) So we assume that both operators B and C has bounded norms.

7

0

θ

kBu(t)k2 dt ≤

4

Convex-concave problem

In this section we consider the problem (3.4) satisfying two assumptions. A1 The sets P and Q are bounded. A2 In (3.2) the functional F (·, v) is convex for any fixed v, F (u, ·) is concave for any fixed u, Φ(·, y) is convex for any fixed y, and Φ(x, ·) is concave for any fixed x. From A1, since the norms of operators B, C are bounded, x(θ), y(θ) are also bounded and we can equivalently reformulate the problem (3.4) in the following way:   max {F (u, v) + Φ(x, y) : y = y˜0 + Cv} : x = x˜0 + Bu = min u∈P,x∈X v∈Q,y∈Y   max min {F (u, v) + Φ(x, y) : x = x˜0 + Bu} : y = y˜0 + Cv , (4.1) v∈Q,y∈Y

u∈P,x∈X

where the sets X and Y are closed convex and bounded. Lemma 4.1. Let Assumptions A1, A2 be true. Then the problem (4.1) is equivalent to the problem minλ maxµ { minu∈P maxv∈Q [F (u, v) − hµ, Bui + hλ, Cvi] (4.2) + minx∈X maxy∈Y [Φ(x, y) + hµ, xi − hλ, yi] − hµ, x˜0 i + hλ, y˜0 i}, which we call the conjugate problem to (4.1). Here λ ∈ Rn and µ ∈ Rm . Proof. Let us consider the inner problem in (4.1). Due to A2, for each v ∈ Q and y ∈ Y this is a problem of minimization of convex function over convex set with linear constraints. Hence, it is equivalent to def

χ(v, y) =

min max {F (u, v) + Φ(x, y) + hµ, x − x˜0 − Bui}

u∈P,x∈X

µ

(4.3)

Due to assumptions A1, A2, using the fact that any closed convex bounded set in Hilbert space is compact in weak topology, and taking into account that F (u, v) is upper semi-continuous in v and lower semi-continuous in u, by Corollary 3.3 in [3] we can swap min and max: χ(v, y) = max min {F (u, v) + Φ(x, y) + hµ, x − x˜0 − Bui} µ

u∈P,x∈X

Note that χ(v, y) in (4.3) is a concave function of v an y. So the outer problem in (4.1) is a problem of maximization of concave function over a convex set with linear constraints. Hence it is equivalent to: max min {χ(v, y) + hλ, Cv + y˜0 − yi}. Using the same arguments as above, we v∈Q,y∈Y

λ

conclude that max min {χ(v, y) + hλ, Cv + y˜0 − yi} = min max {χ(v, y) + hλ, Cv + y˜0 − yi}. v∈Q,y∈Y

λ

λ

v∈Q,y∈Y

Hence, we have (4.1) = min max max min {F (u, v) + Φ(x, y) + hλ, Cv + y˜0 − yi + hµ, x − x˜0 − Bui} λ

v∈Q,y∈Y

µ

u∈P,x∈X

8

Swapping two operations of maximization, we get. min {F (u, v) + Φ(x, y) + hλ, Cv + y˜0 − yi + hµ, x − x˜0 − Bui}

(4.1) = min max max µ

λ

v∈Q,y∈Y u∈P,x∈X

Since function F (u, v) + Φ(x, y) + hλ, Cv + y˜0 − yi + hµ, x − x˜0 − Bui is convex in u, x and concave in v, y, and since P , Q, X, and Y are convex compacts, we can swap max and min , and v∈Q,y∈Y

u∈P,x∈X

obtain (4.1)=(4.2). We assume that problems ψ1 (λ, µ) = min max [F (u, v) − hµ, Bui + hλ, Cvi] ,

(4.4)

ψ2 (λ, µ) = min max [Φ(x, y) + hµ, xi − hλ, yi]

(4.5)

u∈P v∈Q

x∈X y∈Y

are rather simple so that they can be solved efficiently or in a closed-form. Note that the conjugate problem is finite-dimensional. By assumptions A1, A2, since the closed convex bounded set in Hilbert space is compact in weak topology, and since F (u, v) is upper semicontinuous in v and lower semi-continuous in u, by Corollary 3.3 in [3], we conclude that the saddle point in the problems (4.4), (4.5) do exist for all λ ∈ Rn , µ ∈ Rm . Note that the problem (4.4) has the following form Z θ n o  ∗ ∗ ˜ min max F (τ, u(τ ), v(τ )) − hB µ(τ ), u(τ )i + hC λ(τ ), v(τ )i dτ = u∈P v∈Q 0 Z θ h i  ∗ ∗ = min max F˜ (τ, u(τ ), v(τ )) − hB µ(τ ), u(τ )i + hC λ(τ ), v(τ )i dτ , (4.6) 0

u∈P v∈Q

and it can be solved pointwise. Lemma 4.2. Let the assumptions A1 and A2 be true, and (u∗ , v ∗ ) be a saddle point of the problem (4.4) for some fixed λ ∈ Rn , µ ∈ Rm . Then the function ψ1 (·, µ) is convex for any fixed µ ∈ Rm , and its subgradient ∇λ ψ1 (λ, µ) = Cv ∗ is bounded in (λ, µ). Similarly, function ψ1 (λ, ·) is concave for any fixed λ ∈ Rn , and its supergradient ∇µ ψ1 (λ, µ) = −Bu∗ is bounded in (λ, µ). Proof. Since the saddle point in problem (4.4) exists for any λ, µ, we have h i ˜ ψ1 (λ, µ) = min ψ1 (u, λ) − hBu, µi , u∈P h i ψ1 (λ, µ) = max ψˆ1 (v, µ) + hCv, λi , v∈Q

(4.7) (4.8)

where ψ˜1 (u, λ) = max [F (u, v) + hCv, λi] ,

(4.9)

ψˆ1 (v, µ) = min [F (u, v) − hBu, µi] .

(4.10)

v∈Q

u∈P

In the first case, since we take the minimum of linear functions of µ, the result is concave in µ. So ψ1 (λ, µ) is concave with respect to µ for any fixed λ. Similarly, we get that ψ1 (λ, µ) is 9

convex in λ for any fixed µ. Let us fix λ, µ0 . Denote by (u∗0 , v0∗ ) the saddle point of the problem (4.4) for λ, µ0 , and by (u∗ , v ∗ ) the saddle point of this problem for λ, µ, where µ is arbitrary. Then h i h i min ψ˜1 (u, λ) − hBu, µi = ψ˜1 (u∗ , λ)−hBu∗ , µi, min ψ˜1 (u, λ) − hBu, µ0 i = ψ˜1 (u∗0 , λ)−hBu∗0 , µ0 i u∈P

u∈P

Note that ψ1 (λ, µ) − ψ1 (λ, µ0 ) + hBu∗0 , µ − µ0 i = ψ˜1 (u∗ , λ) − hBu∗ , µi − ψ˜1 (u∗ , λ) + hBu∗ , µ0 i + hBu∗ , µ − µ0 i = 0

0

ψ˜1 (u , λ) − hBu , µi − min{ψ˜1 (u, λ) − hBu, µi} − ∗

u∈P



0

(ψ˜1 (u∗0 , λ)

− hBu∗0 , µi) = (ψ˜1 (u∗0 , λ) − hBu∗0 , µi)

≤ 0.

So, by definition, the vector −Bu∗ is supergradient of ψ1 (λ, µ) with respect to µ. In the same way we prove that Cv ∗ is subgradient of ψ1 (λ, µ) with respect to λ. Since P and Q √ √ def √ ≤ θ kQk. are bounded, we have ku(t)k ≤ θ maxu∈P kuk = θ kP k. Similarly, kv(t)k √ Then since the norms kBk and kCk are bounded we have k∇λ ψ1 (λ, µ)k ≤ θ kCk kQk, and √ k∇µ ψ1 (λ, µ)k ≤ θ kBk kP k. In a similar way we can prove the following statement. Lemma 4.3. Let the assumptions A1 and A2 be true, and (x∗ , y ∗ ) be a saddle point of the problem (4.5) for some given λ ∈ Rn , µ ∈ Rm . Then the function ψ2 (·, µ) is convex for any fixed µ ∈ Rm , and its subgradient ∇λ ψ2 (λ, µ) = −y ∗ is bounded in (λ, µ). Similarly, function ψ2 (λ, ·) is concave for any fixed λ ∈ Rn , and its supergradient ∇µ ψ2 (λ, µ) = x∗ which is bounded in (λ, µ). def

Combining Lemmas 4.2 and 4.3, we get that the function ψ(λ, µ) = ψ1 (λ, µ) + ψ2 (λ, µ) − y0 −y ∗ , hµ, x˜0 i+hλ, y˜0 i is convex in λ and concave in µ with partial subgradients ψλ0 (λ, µ) = Cv ∗ +˜ and ψµ0 (λ, µ) = x∗ − x˜0 − Bu∗ satisfying the bounds √ √

def def kψλ0 (λ, µ)k ≤ Lλ = θ kCk kQk + kY k + k˜ y0 k, ψµ0 (λ, µ) ≤ Lµ = θ kBk kP k + kXk + k˜ x0 k. (4.11) Here for any set S ⊂ Rk , we use notation kSk = maxs∈S ksk.

4.1

Example of the problem (4.4)

Let us consider an example with n = 2, m = 2, θ = 1, p = q = 1, P = [−1, 1], Q = [−1, 1], and         0 1 0 0 1 Ax (t) = Ay (t) = , B(t) = C(t) = , x(0) = , y(0) = . 0 0 1 0 1 The objective functional will be as follows:  Z 1 (u(t))2 (v(t))2 1 J(u, v) = − dt + kx(1) − y(1)k2 − ky(1) − y0 k2 , 2 2 2 0 10

where y0 = (2, 0)T , and the norm is Euclidean. This functional satisfies Assumptions A1 and A2. Note that   1 t−τ Vx (t, τ ) = Vy (t, τ ) = , 0 1 (B ∗ µ)(t) = B T (t)VxT (1, t)µ = (1 − t)µ1 + µ2 ,

(C ∗ λ)(t) = (1 − t)λ1 + λ2 .

Also we can explicitly solve (4.4) using (4.6): Z 1 ψ1 (λ, µ) = (f ((1 − t)µ1 + µ2 ) − f ((1 − t)λ1 + λ2 ))dt, 0

where function f (ρ) : R → R is defined by ( 2 − ρ2 |ρ| ≤ 1 f (ρ) = 1 . − |ρ| |ρ| > 1 2

4.2

Estimating the norms of λ∗ , µ∗

Let us compute the estimates for the norms of the components λ and µ of the solution of conjugate problem. def

Lemma 4.4. Assume that ∆ux = def

and ∆vy =

max

v,˜ v ∈Q,y,˜ y ∈Y,u∈P,x∈X

max

u,˜ u∈P,x,˜ x∈X,v∈Q,y∈Y

{F 0 (u|˜ v , v − v˜) + Φ0 (x|˜ y , y − y˜)} < +∞. If

Br (0) ⊂ {s = Bu + x˜0 − x : u ∈ P, x ∈ X}, then kµ∗ k ≤

∆ux r

{−F 0 (˜ u, u − u˜|v) − Φ0 (˜ x, x − x˜|y)} < +∞,

and kλ∗ k ≤

Br (0) ⊂ {s = Cv + y˜0 − y : v ∈ Q, y ∈ Y }, (4.12)

∆vy . r

Proof. Consider the function ψ(λ∗ , µ). It is concave and it achieves its maximum at µ∗ . We also know that ∂µ ψ(λ∗ , µ) = {xµ −˜ x0 −Buµ : uµ ∈ P, xµ ∈ X, ψ˜1 (uµ , λ∗ )+ψ˜2 (xµ , λ∗ )+hxµ −˜ x0 −Buµ , µi = ψ(λ∗ , µ)} From (4.9), and a similar definition of ψ˜2 (xµ , λ∗ ), we get that ψ˜10 (uµ , u − uµ |λ∗ ) = F 0 (uµ , u − uµ |v ∗ (uµ , λ∗ )),

ψ˜20 (xµ , x − xµ |λ∗ ) = Φ0 (xµ , x − xµ |v ∗ (xµ , λ∗ )).

From this equalities and the optimality conditions for the problem defining ψ(λ∗ , µ), we get F 0 (uµ , u−uµ |v ∗ (uµ , λ∗ ))+Φ0 (xµ , x−xµ |v ∗ (xµ , λ∗ ))+hx−xµ −Bu+Buµ , µi ≤ 0,

∀u ∈ P, x ∈ X,

or hx−Bu−˜ x0 , µi ≤ −F 0 (uµ , u−uµ |v ∗ (uµ , λ∗ ))−Φ0 (xµ , x−xµ |v ∗ (xµ , λ∗ ))+hµ, ψµ0 (λ∗ , µ)i, 11

∀u ∈ P, x ∈ X.

Hence max hx − Bu − x˜0 , µi ≤ ∆ux + hµ, ψµ0 (λ∗ , µ)i.

u∈P,x∈X

Using the first inclusion in (4.12) and the fact that 0 ∈ ∂µ ψ(λ∗ , µ∗ ), we get that kµ∗ k ≤

∆ux . r

The estimate for kλ∗ k is proved in the same manner. Let us consider an example of situation when the inclusions (4.12) hold. If the set X has nonempty interior and there exists some u¯ ∈ P such that B¯ u + x˜0 = x¯ ∈ intX, then there exists some r > 0 such that x¯ + Br (0) ⊂ X. Then we have Br (0) = B¯ u + x˜0 − x¯ + Br (0) ⊂ {s = Bu + x˜0 − x : u ∈ P, x ∈ X}. Similar arguments can be used for the second inclusion in (4.12). Let us consider another example. Since the problem (3.4) does not have any constraints for x and y, and the sets X and Y can be introduced due to boundedness of the norms of operators B, C, we can apply the following reasoning. Assume that there exists some u¯ ∈ P and r0 > 0 such that P ⊂ Br0 (¯ u). Then for every u ∈ P there exists some u˜ ∈ Br0 (0) : u = u¯ + u˜. Hence for every u ∈ P , xu (θ) = x˜0 + B¯ u + B˜ u and kxu (θ) − x˜0 − B¯ uk ≤ kB˜ uk ≤ kBkr0 . So if we choose X = BkBkr0 (˜ x0 + B¯ u) we will be in the situation of previous example and can take r = kBkr0 in the conditions of Lemma 4.4. For the second inclusion in (4.12) we can apply similar arguments.

4.3

Algorithm description

We assume that we are given some norm in the space Rn and prox-function dλ (λ) with proxcenter λ0 , which is strongly convex with convexity parameter σλ in the given norm. For µ we introduce the similar assumptions. Since (λ∗ , µ∗ ) is the saddle point, by definition we have the following inequalities: ψ(λ∗ , µ) ≤ ψ(λ∗ , µ∗ ) ≤ ψ(λ, µ∗ ) ∀λ, µ. From the convexity of the function ψ(λ, µ) with respect to λ, by definition of partial subgradient ψµ0 (λ, µ) at the point (λ, µ) we have the following: ψ(λ∗ , µ) ≥ ψ(λ, µ) + hψλ0 (λ, µ), λ∗ − λi ∀λ, µ. Similarly, using concavity of ψ(λ, µ) with respect to µ, we have: ψ(λ, µ∗ ) ≤ ψ(λ, µ) + hψµ0 (λ, µ), µ∗ − µi ∀λ, µ Finally, from the above inequalities we have: hψλ0 (λ, µ), λ − λ∗ i + h−ψµ0 (λ, µ), µ − µ∗ i ≥ 0 ∀λ, µ. 12

Hence, (λ∗ , µ∗ ) is a weak solution to variational inequality hg(λ, µ), (λ − λ∗ , µ − µ∗ )i ≥ 0,

∀λ, µ,

where g(λ, µ) = (ψλ0 (λ, µ), −ψµ0 (λ, µ)). All of this allows us to apply the method of Simple Dual Averages (SDA) from [4] for finding an approximate solution of the finite-dimensional problem (4.2). def Let us choose some κ ∈ (0, 1). As in Section 4 in [4], we consider a space of z = (λ, µ) with the norm q (4.13) kzk = κσλ kλk2 + (1 − κ)σµ kµk2 , an oracle g(z) = (gλ (z), −gµ (z)), a new prox-function d(z) = κdλ (λ) + (1 − κ)dµ (µ), which is strongly convex with constant σ0 = 1 with respect to the norm (4.13). We define W = Rn ×Rm . The conjugate norm for (4.13) is s 1 1 kgλ k2λ,∗ + kgµ k2µ,∗ kgk∗ = κσλ (1 − κ)σµ def

So we have a uniform upper bound for the answers of the oracle kgk2∗ ≤ L2 = where Lλ and Lµ are defined in (4.11). The SDA method for solving (4.2) is the following

L2λ κσλ

+

L2µ , (1−κ)σµ

1. Initialization: Set s0 = 0. Choose z0 , γ > 0. 2. Iteration (k ≥ 0): Compute gk = g(zk ). Set sk+1 = sk + gk . βk+1 = γ βˆk+1 . Set zk+1 = πβk+1 (−sk+1 ).

(M1)

In accordance to Lemma 3 in [4], for k ≥ 1 it satisfies inequalities √ 2k − 1 ≤ βˆk ≤

√ 1 √ + 2k − 1 1+ 3

The mapping πβ (s) is defined in the following way πβ (s) = arg min {−hs, zi + βd(z)} z∈W

Since the saddle point in the problem (4.1) do exist, there exists a saddle point (λ∗ , µ∗ ) in the conjugate problem (4.2). According to Theorem 1 in [4] the method (M1) generates bounded sequence {zi }i≥0 . Hence the sequences {λi }i≥0 , {µi }i≥0 are also bounded. So we can choose Dλ , Dµ such that dλ (λi ) ≤ Dλ , dµ (µi ) ≤ Dµ for all i ≥ 0. Also, the pair (λ∗ , µ∗ ) is def def an interior solution: Br/√κσλ (λ∗ ) ⊆ Wλ = {λ : dλ (λ) ≤ Dλ }, and Br/√(1−κ)σµ (µ∗ ) ⊆ Wµ = def

{µ : dµ (µ) ≤ Dµ } for some r > 0. Then we have z ∗ = (λ∗ , µ∗ ) ∈ FD = {z ∈ W : d(z) ≤ D} with D = κDλ + (1 − κ)Dµ and Br (z ∗ ) ⊆ FD . 13

Let us introduce a gap function δk (D) = max

( k X

z

) hgi , zi − zi : z ∈ FD

.

(4.14)

i=0

From Theorem 2 in [4] (equation (4.6)) we have   1 βˆk+1 L2 δk (D) ≤ γD + . k+1 k+1 2σ0 γ

(4.15)

Denote k

k

k

1 X = vi , k + 1 i=0

1 X = xi , k + 1 i=0

k

1 X vˆk+1 xˆk+1 yˆk+1 = yi , uˆk+1 k + 1 i=0 (4.16) where (ui , vi ), (xi , yi ) are the saddle points at the point (λi , µi ) in (4.4) and (4.5) respectively. Note that for all u ∈ P , v ∈ Q, x ∈ X, and y ∈ Y we have 1 X ui , = k + 1 i=0

F (u, vi ) + Φ(x, yi ) + hµi , x − x˜0 − Bui + hλi , Cvi + y˜0 − yi i ≥ ψ(λi , µi ) ≥ F (ui , v) + Φ(xi , y) + hµi , xi − x˜0 − Bui i + hλi , Cv + y˜0 − yi.

(4.17)

We define a function φ(u, x, v, y) = minλ maxµ { F (u, v) + Φ(x, y) + hµ, x − x˜0 − Bui + hλ, Cv + y˜0 − yi : (4.18) dλ (λ) ≤ Dλ , dµ (µ) ≤ Dµ }. Since dλ (λ∗ ) ≤ Dλ , dµ (µ∗ ) ≤ Dµ , and the conjugate problem is equivalent to the initial one, we conclude that the initial problem is equivalent to the problem min

max φ(u, x, v, y)

u∈P,x∈X v∈Q,y∈Y

(4.19)

Let us introduce two auxiliary functions: ξ(u, x) = max φ(u, x, v, y)

(4.20)

η(v, y) = min φ(u, x, v, y)

(4.21)

v∈Q,y∈Y

u∈P,x∈X

Note that ξ(u, x) is convex, η(v, y) is concave, and ξ(u, x) ≥ φ(u∗ , x∗ , v ∗ , y ∗ ) ≥ η(v, y) ∀u ∈ P, v ∈ Q, x ∈ X, y ∈ Y , where φ(u∗ , x∗ , v ∗ , y ∗ ) is the solution to (4.19). Theorem 4.1. Let the assumptions A1 and A2 be true. Then the points (4.16) generated by method (M1) satisfy:   L2 βˆk+1 γD + (4.22) ξ(ˆ uk+1 , xˆk+1 ) − η(ˆ vk+1 , yˆk+1 ) ≤ k+1 2γ   √ 2 βˆk+1 σµ k˜ x0 + Bˆ uk+1 − xˆk+1 k ≤ r(k+1) γD + L2γ ,   √ (4.23) βˆk+1 σλ L2 k˜ y0 + Cˆ vk+1 − yˆk+1 k ≤ r(k+1) γD + 2γ . 14

Proof. From inequalities (4.17), by convexity of F (·, v), Φ(·, y) we have F (ˆ uk+1 , v) + Φ(ˆ xk+1 , y) + hµ, xˆk+1 − x˜0 − Bˆ uk+1 i + hλ, Cv + y˜0 − yi ≤ k

k

k

1 X 1 X 1 X ψ(λi , µi ) + hµ − µi , xi − x˜0 − Bui i + hλ − λi , Cv + y˜0 − yi k + 1 i=0 k + 1 i=0 k + 1 i=0 This gives us ) ( k k 1 X 1 X ξ(ˆ uk+1 , xˆk+1 ) ≤ ψ(λi , µi ) + max hµ − µi , xi − x˜0 − Bui i : dµ (µ) ≤ Dµ + µ k + 1 i=0 k + 1 i=0 ( ) k 1 X + max min hλ − λi , Cv + y˜0 − yi : dλ (λ) ≤ Dλ . (4.24) v∈Q,y∈Y λ k + 1 i=0 Since method (M1) generates points λi , which satisfy Pk dλ (λi ) ≤ Dλ , and there exist v1 ∈ Q and 1 y1 such that Cv1 + y˜0 = y1 , we conclude that ( k+1 i=0 λi , v1 , y1 ) is the saddle-point of the third ( ) k X 1 term, and max min hλ − λi , Cv + y˜0 − yi : dλ (λ) ≤ Dλ = 0. v∈Q,y∈Y λ k + 1 i=0 Similarly, we have ) ( k k 1 X 1 X η(ˆ vk+1 , yˆk+1 ) ≥ ψ(λi , µi ) − max hλi − λ, Cvi + y˜0 − yi i : dλ (λ) ≤ Dλ λ k + 1 i=0 k + 1 i=0 Finally we have the following ξ(ˆ uk+1 , xˆk+1 ) − η(ˆ vk+1 , yˆk+1 ) o nP k 1 ≤ k+1 (maxµ hµ − µ , x − x ˜ − Bu i : d (µ) ≤ D i i 0 i µ µ o nP i=0 k ˜0 − yi i : dλ (λ) ≤ Dλ ) + maxλ i=0 hλi − λ, Cvi + y o nP k 1 = ≤ k+1 maxz hg (z), z − zi : d(z) ≤ κD + (1 − κ)D i i λ µ i=0

1 δ (D). k+1 k

Combining this with (4.15), we get (4.22). Let us prove that (4.16) is also a nearly feasible solution. Obviously 1 1 1 kBˆ uk+1 + x˜0 − xˆk+1 k2 ≤ kBˆ uk+1 + x˜0 − xˆk+1 k2 + kCˆ vk+1 + y˜0 − yˆk+1 k2 (1 − κ)σµ (1 − κ)σµ κσλ 1 1 1 kCˆ vk+1 + y˜0 − yˆk+1 k2 ≤ kBˆ uk+1 + x˜0 − xˆk+1 k2 + kCˆ vk+1 + y˜0 − yˆk+1 k2 κσλ (1 − κ)σµ κσλ On the other hand, from the proof of the third item in Theorem 1 in [4] we have

2 h i2

1 Pk

2 1 δ (D) ≥ kˆ s k = (g (z ), −g (z ))

k k+1 λ i µ i i=0 r(k+1) k+1

2

1 Pk

= k+1 i=0 (Cvi + y˜0 − yi , Bui + x˜0 − xi ) =

1 (1−κ)σµ

kBˆ uk+1 + x˜0 − xˆk+1 k2 + 15

1 κσλ

kCˆ vk+1 + y˜0 − yˆk+1 k2 .

This in combination with (4.15) gives us (4.23). So we conclude that uk+1 , vˆk+1 , xˆk+1 , yˆk+1 ) is nearly optimal and nearly feasible point with   (ˆ 1 an error of O √k+1 .

5

Strongly convex-concave problem

In this section we consider the problem (3.4) safisfying the following assumptions. A3 function F (·, v) is strongly convex for any fixed v with constant σFu which does not depend on v, and function F (u, ·) is strongly concave for any fixed u with constant σFv which does not depend on u. Also assume that: k∇u F (u, v1 ) − ∇u F (u, v2 )k ≤ Luv kv1 − v2 k k∇v F (u1 , v) − ∇v F (u2 , v)k ≤ Lvu ku1 − u2 k

(5.1) (5.2)

A4 Φ(·, y) is strongly convex for any fixed y with constant σΦx which doesn’t depend on y and Φ(x, ·) is strongly concave for any fixed x with constant σΦy which doesn’t depend on x. Also assume that: k∇x Φ(x, y1 ) − ∇x Φ(x, y2 )k ≤ Lxy ky1 − y2 k , k∇y Φ(x1 , y) − ∇y Φ(x2 , y)k ≤ Lyx kx1 − x2 k ,

(5.3) (5.4)

k∇x Φ(x1 , y) − ∇x Φ(x2 , y)k ≤ Lxx kx1 − x2 k , k∇y Φ(x, y1 ) − ∇y Φ(x, y2 )k ≤ Lyy ky1 − y2 k .

(5.5) (5.6)

and

Note that assumptions A3, A4 imply that the level sets of functions F (u, v), Φ(x, y) are closed convex and bounded. Similarly to the proof of Lemma 4.1, we get that the conjugate problem for (3.4) is minλ maxµ { minu∈P maxv∈Q [F (u, v) − hµ, Bui + hλ, Cvi] + minx maxy [Φ(x, y) + hµ, xi − hλ, yi] − hµ, x˜0 i + hλ, y˜0 i }.

(5.7)

Here λ ∈ Rn and µ ∈ Rm . We assume that problems ψ1 (λ, µ) = min max [F (u, v) − hµ, Bui + hλ, Cvi] ,

(5.8)

ψ2 (λ, µ) = min max [Φ(x, y) + hµ, xi − hλ, yi]

(5.9)

u∈P v∈Q x

y

are simple, which means that they can be solved efficiently or in a closed form (see example in Section 4). Note that the conjugate problem is finite-dimensional. Using assumptions A3, A4, the fact that closed convex bounded set in Hilbert space is compact in weak topology, the fact that F (u, v) is upper semi-continuous in v and lower semi-continuous in u, and Corollary 3.3 in [3], we conclude that the saddle points in the problems (5.8), (5.9) do exist for all λ ∈ Rn and µ ∈ Rm . 16

Lemma 5.1. Let Assumption A3 be true. Then ψ1 (λ, µ) in (5.8) is smooth with partial gradients satisfying the following Lipschitz condition: kBk2 kBk kCk Luv kµ1 − µ2 k + kλ1 − λ2 k , σFu σFu σFv kCk2 kBk kCk Lvu k∇λ ψ1 (λ1 , µ1 ) − ∇λ ψ1 (λ2 , µ2 )k ≤ kλ1 − λ2 k + kµ1 − µ2 k . σFv σFu σFv

k∇µ ψ1 (λ1 , µ1 ) − ∇µ ψ1 (λ2 , µ2 )k ≤

(5.10) (5.11)

0 (λ, µ) = −Bu∗ (λ, µ). From the strong convexity Proof. From Lemma 4.2 we know that ψ1µ of the F (·, v), we have for any t ∈ [0, 1]

ψ˜1 (tu1 + (1 − t)u2 , λ) = max [F (tu1 + (1 − t)u2 , v) + hCv, λi] ≤ v∈Q

max [tF (u1 , v) + (1 − t)F (u2 , v) − t(1 − t)σF u ku1 − u2 k + hCv, λi] ≤ v∈Q

t max [F (u1 , v) + hCv, λi] + (1 − t) max [F (u2 , v) + hCv, λi] − t(1 − t)σF u ku1 − u2 k = v∈Q

v∈Q

tψ˜1 (u1 , λ) + (1 − t)ψ˜1 (u2 , λ) − t(1 − t)σF u ku1 − u2 k So, by definition, the function ψ˜1 (u, λ) is strongly convex with constant σFu . This means that the optimal point u∗ in (4.7) is unique and that ψ1 (λ, µ) is smooth with respect to µ. Hence ∇µ ψ1 (λ, µ) = −Bu∗ (λ, µ). Since F (u, ·) is strongly concave for any fixed u, we get that the solution v ∗ of (4.9) is unique and the function ψ˜1 (u, λ) is smooth with respect to u. Denote ui the optimal point in (4.7) for some λ, µi , i = 1, 2. From the first order optimality conditions for (4.7) we have h∇u ψ˜1 (u1 , λ) − B ∗ µ1 , u2 − u1 i ≥ 0 h∇u ψ˜1 (u2 , λ) − B ∗ µ2 , u1 − u2 i ≥ 0 Adding these inequalities and using strong convexity of ψ˜1 (u, λ), we continue as follows: hB ∗ (µ1 − µ2 ), u1 − u2 i ≥ h∇u ψ˜1 (u1 , λ) − ∇u ψ˜1 (u2 , λ), u1 − u2 i ≥ σFu ku1 − u2 k2 Finally we have kBk2 ∗ kBk2 hB (µ1 − µ2 ), u1 − u2 i ≤ kµ1 − µ2 k kB(u1 − u2 )k σFu σFu (5.12) Denote by (ui , vi ) the saddle point in (5.8) for some λi , µ and i = 1, 2. Similarly to the previous case, we conclude that ψˆ1 (v, µ) is strongly concave in v with constant σFv and smooth with respect to v. As we did this above, from the first order optimality conditions for (4.8) and using strong concavity of ψˆ1 (v, µ) we have: kBu1 − Bu2 k2 ≤ kBk2 ku1 − u2 k2 ≤

hC ∗ (λ1 − λ2 ), v1 − v2 i ≥ σFv kv1 − v2 k2 . This gives us kv1 − v2 k ≤

kCk kλ1 − λ2 k . σFv 17

From the first order optimality conditions for (4.10) we get: h∇u F (u2 , v2 ) − ∇u F (u1 , v1 ), u1 − u2 i ≥ 0. Using that F (·, v) is strongly convex for any fixed v this gives us h∇u F (u2 , v2 ) − ∇u F (u2 , v1 ), u1 − u2 i ≥ σFu ku1 − u2 k2 . From this, using (5.1) and the estimate for kv1 − v2 k, we get ku1 − u2 k ≤

kCk Luv kλ1 − λ2 k . σFu σFv

Finally we have kBu1 − Bu2 k ≤ kBk ku1 − u2 k ≤

kCk kBk Luv kλ1 − λ2 k σFu σFv

(5.13)

Combining (5.12) and (5.13), we get (5.10). Estimate (5.11) can be proved in the same manner. In a similar way, we can prove the following statement. Lemma 5.2. Let Assumption A4 be true. Then function ψ2 (λ, µ) in (5.9) is smooth with partial gradients satisfying the following Lipschitz condition: 1 Lxy kµ1 − µ2 k + kλ1 − λ2 k σΦx σΦx σΦy Lyx 1 kλ1 − λ2 k + kµ1 − µ2 k k∇λ ψ2 (λ1 , µ1 ) − ∇λ ψ2 (λ2 , µ2 )k ≤ σΦy σΦx σΦy k∇µ ψ2 (λ1 , µ1 ) − ∇µ ψ2 (λ2 , µ2 )k ≤

(5.14) (5.15)

def

Combining Lemma 5.1 and 5.2, we get that the function ψ(λ, µ) = ψ1 (λ, µ) + ψ2 (λ, µ) − hµ, x˜0 i + hλ, y˜0 i is convex in λ and concave in µ with partial gradients ∇λ ψ(λ, µ) = Cv ∗ + y˜0 − y ∗ , ∇µ ψ(λ, µ) = x∗ − x˜0 − Bu∗ ,

(5.16) (5.17)

where (u∗ , v ∗ ) is the saddle point for problem (5.8), and (x∗ , y ∗ ) is the saddle point for problem (5.9).

5.1

Estimating the norms of λ∗ , µ∗

Let us compute bounds for the norms of components λ, µ of the solution of the conjugate problem (5.7). Lemma 5.3. Let Assumptions A3,A4 be true. Assume that P ⊂ Br1 (u0 ) and Q ⊂ Br2 (v0 ). Then kµ∗ k ≤ Lxx kBkr1 + Lxy kCkr2 + k∇x Φ(Bu0 + x˜0 , Cv0 + y˜0 )k and kλ∗ k ≤ Lyx kBkr1 + Lyy kCkr2 + k∇y Φ(Bu0 + x˜0 , Cv0 + y˜0 )k. 18

Proof. Consider the function ψ(λ∗ , µ). From Lemmas 5.1 and 5.2, we know that this function is concave and smooth in µ with gradient given by (5.17). Also, function ψ(λ∗ , µ) achieves its maximum at µ = µ∗ . From the corresponding optimality condition, we get 0 = ∇µ ψ(λ∗ , µ∗ ) = x∗ − Bu∗ − x˜0 . Hence x∗ = Bu∗ + x˜0 . Similarly, we have y ∗ = Cv ∗ + y˜0 . We can find also x∗ , y ∗ from problem (5.9), which can be rewritten as min{hµ∗ , xi + ψ˜2 (x, λ∗ )}, x

ψ˜2 (x, λ∗ ) = max{Φ(x, y) − hλ∗ , yi}. y

Since Φ(x, y) is strongly convex in y, from the optimality condition for the first problem we obtain that µ∗ = −∇x ψ˜2 (x∗ , λ∗ ) = −∇x Φ(x∗ , y ∗ ). Finally we have the following sequence of inequalities kµ∗ k = k∇x Φ(x∗ , y ∗ )k ≤ k∇x Φ(Bu∗ − x˜0 , Cv ∗ + y˜0 ) − ∇x Φ(Bu0 + x˜0 , Cv0 + y˜0 )k+ + k∇x Φ(Bu0 + x˜0 , Cv0 + y˜0 )k ≤ ≤ Lxx kBu∗ − Bu0 k + Lxy kCv ∗ − Cv0 k + k∇x Φ(Bu0 + x˜0 , Cv0 + y˜0 )k ≤ ≤ Lxx kBkr1 + Lxy kCkr2 + k∇x Φ(Bu0 + x˜0 , Cv0 + y˜0 )k. The proof of the second inequality in the statement is similar.

5.2

Algorithm description

Let us introduce prox-function dλ (λ) = σ2λ kλk2 , where we use Euclidian norm. Function dλ (λ) is strongly convex in this norm with convexity parameter σλ . For the variable µ we introduce prox-function dµ (µ) = σ2µ kµk2 , which is strongly convex with convexity parameter σµ with respect to Euclidian norm. These prox-functions are differentiable everywhere. For any λ1 , λ2 ∈ Rn we can define Bregman distance: ωλ (λ1 , λ2 ) = dλ (λ2 ) − dλ (λ1 ) − h∇dλ (λ1 ), λ2 − λ1 i. ¯ = 0 as Using explicit expression for dλ (λ), we get ωλ (λ1 , λ2 ) = σ2λ kλ1 − λ2 k2 . Let us choose λ ¯ λ) = dλ (λ). For µ we introduce the similar the center of the space Rn . Then we have ωλ (λ, settings. In the same way as this was done in Section 4.3, we conclude that finding the saddle point ∗ (λ , µ∗ ) for conjugate problem (5.7) is equivalent to solving variational inequality hg(λ, µ), (λ − λ∗ , µ − µ∗ )i ≥ 0,

∀λ, µ,

(5.18)

where g(λ, µ) = (∇λ ψ(λ, µ), −∇µ ψ(λ, µ)). def

Let us choose some κ ∈ (0, 1). Consider a space of z = (λ, µ) with the norm q kzk = κσλ kλk2 + (1 − κ)σµ kµk2 ,

19

(5.19)

an oracle g(z) = (∇λ ψ(λ, µ), −∇µ ψ(λ, µ)), a new prox-function d(z) = κdλ (λ) + (1 − κ)dµ (µ) which is strongly convex with constant σ0 = 1. We define W = Rn × Rm , Bregman distance ω(z1 , z2 ) = κωλ (λ1 , λ2 ) + (1 − κ)ωλ (µ2 , µ2 ) which has an explicit form of ω(z1 , z2 ) = d(z1 − z2 ), and center z¯ = (0, 0). Then we have ω(¯ z , z) = d(z). Note that the norm in the dual space is defined as s 1 1 kgk = kgλ k2 + kgµ k2 κσλ (1 − κ)σµ σµ . σµ +σλ

Lemma 5.4. Let Assumptions A3, A4 be true, and κ = (5.19) is Lipschitz continuous:

Then operator g(z) defined in

kg(z1 ) − g(z2 )k ≤ L kz1 − z2 k

(5.20)

with L=

σλ +σµ σµ σλ

r  2 + 2 kCk σF v

1 σΦy

+

kBkkCkLvu σFu σFv

+

Lyx σΦx σΦy



kBkkCkLuv σFu σFv

+

Lxy σΦx σΦy

+

kBk2 σFu

+

1 σΦx

 .

(5.21)

Proof. Denote c = kλ1 − λ2 k , 2 + σΦ1 , α1 = kCk σF v

β1 =

kBkkCkLvu σFu σFv

d = kµ1 − µ2 k , uv α2 = kBkkCkL + σF σF

y

+

Lyx σΦx σΦy

, β2 =

u 2

kBk σFu

v

+

Lxy σΦx σΦy

,

1 . σΦx

Then from equations (5.10), (5.11), (5.14), (5.15) we have: k∇λ ψ(λ1 , µ1 ) − ∇λ ψ(λ2 , µ2 )k2 ≤ (α1 c + β1 d)2 , k∇µ ψ(λ1 , µ1 ) − ∇µ ψ(λ2 , µ2 )k2 ≤ (α2 c + β2 d)2 . Since κ =

σµ , σµ +σλ

we get κσλ = (1 − κ)σµ =

σµ σλ def = σ. σµ + σλ

Using the above expressions we obtain kg(z1 ) − g(z2 )k2 = ≤

1 σ

(α1 c + β1 d)2 +

1 κσλ 1 σ

k∇λ ψ(λ1 , µ1 ) − ∇λ ψ(λ2 , µ2 )k2 +

1 (1−κ)σµ

k∇µ ψ(λ1 , µ1 ) − ∇µ ψ(λ2 , µ2 )k2

(α2 c + β2 d)2 ≤ σ2 (α1 c + β1 d)(α2 c + β2 d)

≤ σ2 (α1 α2 c2 + β1 β2 d2 + (α1 β2 + α2 β1 )cd) ≤ σ1 ((2α1 α2 + α1 β2 + α2 β1 )c2 + (2β1 β2 + α1 β2 + α2 β1 )d2 ) p p ≤ σ2 ( α1 α2 (α1 + β1 )(α2 + β2 )c2 + β1 β2 (α1 + β1 )(α2 + β2 )d2 ) ≤

2(α1 +β1 )(α2 +β2 ) (κσλ c2 σ2

+ (1 − κ)σµ d2 ) =

2(α1 +β1 )(α2 +β2 ) σ2

20

kz1 − z2 k2 .

Thus, we get that g(z) is Lipschitz continuous with q 2 +β2 ) L = 2(α1 +β1σ)(α r 2  σλ +σµ Lyx kCk2 kBkkCkLvu kBkkCkLuv 1 = σµ σλ 2 σF v + σΦy + σF u σF v + σΦx σΦy + σF u σF v

Lxy σΦx σΦy

+

kBk2 σF u

+

1 σΦx



.

In accordance to [5] for solving (5.18), we can use the following method: 1. Initialization: Fix β =

L . σ0

Set s−1 = 0.

2. Iteration (k ≥ 0): Compute xk = Tβ (¯ z , sk−1 ),

(M2)

Compute zk = Tβ (xk , −g(xk )), Set sk = sk−1 − g(zk ). Here Tβ (z, s) = arg max{hs, x − zi − βω(z, x)}. x∈W

Similarly to [4], we can prove that method (M2) generates bounded sequence {zi }i≥0 . Hence the sequences {λi }i≥0 , {µi }i≥0 are also bounded. Also since the saddle point in the problem (3.4) exists, there exists a saddle point (λ∗ , µ∗ ) for conjugate problem (5.7). These arguments allow us to choose Dλ , Dµ such that dλ (λi ) ≤ Dλ , dµ (µi ) ≤ Dµ for all i ≥ 0, which also ensure that (λ∗ , µ∗ ) is an interior solution: def

Br/√κσλ (λ∗ ) ⊆ Wλ = {λ : dλ (λ) ≤ Dλ } , def

Br/√(1−κ)σµ (µ∗ ) ⊆ Wµ = {µ : dµ (µ) ≤ Dµ } def

for some r > 0. Then we have z ∗ = (λ∗ , µ∗ ) ∈ FD = {z ∈ W : d(z) ≤ D} with D = κDλ + (1 − κ)Dµ and Br (z ∗ ) ⊆ FD . From Theorem 1 in [5], similarly to the proof of Theorem 2 in [5], we get the following lemma. Lemma 5.5. Assume that operator g(z) is Lipschitz continuous on W with constant L. Let the sequence {zi }i≥0 be generated by method (M2). Then for any k ≥ 0 we have δk (D) ≤ where δk (D) is defined in (4.14).

21

LD , σ0

(5.22)

µ Theorem 5.1. Let Assumptions A3 and A4 be true, and κ = σµσ+σ , L be defined in (5.21). λ Let the points zi = (λi , µi ), i ≥ 0 be generated by method (M2). Let points in (4.16) be defined by points (ui , vi ), (xi , yi ) which are the saddle points at the point (λi , µi ) in (5.8) and (5.9) respectively. Then for functions ξ(u, x), η(v, y) defined in (4.20) and (4.21) we have:

ξ(ˆ uk+1 , xˆk+1 ) − η(ˆ vk+1 , yˆk+1 ) ≤

LD k+1

(5.23)

Also the following is true: √ LD σµ kBˆ uk+1 + x˜0 − xˆk+1 k ≤ , r(k + 1)

√ LD σλ kCˆ vk+1 + y˜0 − yˆk+1 k ≤ . r(k + 1)

(5.24)

Proof. Similarly to the proof of Theorem 4.1, we conclude that δk (D) , k+1 √ δk (D) σµ kBˆ uk+1 + x˜0 − xˆk+1 k ≤ , r(k + 1) √ δk (D) σλ kCˆ vk+1 + y˜0 − yˆk+1 k ≤ . r(k + 1)

ξ(ˆ uk+1 , xˆk+1 ) − η(ˆ vk+1 , yˆk+1 ) ≤

Using Lemma 5.5 and the fact that σ0 = 1, these inequalities prove the statement of the theorem.

References [1] P.J. Antsaklis and A.N. Michel. Linear Systems. Birkhauser Book (2006). [2] Devolder, Olivier; Glineur, Franc,ois; Nesterov, Yurii Double Smoothing Technique for Large-Scale Linearly Constrained Convex Optimization. In: SIAM Journal on Optimization, Vol. 22, no. 2, p. 702-727 (2012). [3] Maurice, Sion. On general minimax theorems. // Pacific J. Math. Volume 8, Number 1 (1958), 171-176. http://projecteuclid.org/euclid.pjm/1103040253 [4] Nesterov, Yurii. Primal-dual subgradient methods for convex problems. In: Mathematical Programming, Vol. 120, no. 1, p. 221-259 (August 2009). [5] Nesterov, Yurii. Dual extrapolation and its applications for solving variational inequalities and related problems. Journal Mathematical Programming: Series B Volume 109 Issue 2, January 2007 Pages 319 - 344

22

Suggest Documents