F-Lipschitz Optimization with Wireless Sensor Networks Applications

3 downloads 0 Views 281KB Size Report
Wireless sensor networks, with their applications to smart grids, water ..... the same argument, the geometric programming theory, which has also been widely ...
1

F-Lipschitz Optimization with Wireless Sensor Networks Applications Carlo Fischione Automatic Control and ACCESS Linnaeus Center Electrical Engineering KTH Royal Institute of Technology 10044, Stockholm, Sweden [email protected] Abstract Motivated by the need for fast computations in wireless sensor networks, the new F-Lipschitz optimization theory is introduced for a novel class of optimization problems. These problems are defined by simple qualifying properties specified in terms of increasing objective function and contractive constraints. It is shown that feasible F-Lipschitz problems have always a unique optimal solution that satisfies the constraints at equality. The solution is obtained quickly by asynchronous algorithms of certified convergence. F-Lipschitz optimization can be applied to both centralized and distributed optimization. Compared to traditional Lagrangian methods, which often converge linearly, the convergence time of centralized F-Lipschitz problems is at least superlinear. Distributed F-Lipschitz algorithms converge fast, as opposed to traditional Lagrangian decomposition and parallelization methods, which generally converge slowly and at the price of many message passings. In both cases, the computational complexity is much lower than traditional Lagrangian methods. Examples of application of the new optimization method are given for distributed estimation and radio power control in wireless sensor networks. The drawback of the F-Lipschitz optimization is that it might be difficult to check the qualifying properties. For more general optimization problems, it is suggested that it is convenient to have conditions ensuring that the solution satisfies the constraints at equality. Index Terms Wireless Sensor Networks; Convex and Non-convex Optimization; Distributed Optimization; MultiObjective Optimization; Parallel and Distributed Computation; Interference Function Theory. The author is grateful to A. Speranzon, for the numerous and fruitful discussions on distributed estimation topics, which have lead to the development of F-Lipschitz optimization, to U. J¨onsson, who contributed to the proof of the case (III.2d) – (III.2f) and for carefully proofreading the paper, and to S. Boyd and M. Johansson for stimulating discussions on the topics of this paper. The author acknowledges the support of the Swedish Research Council and of the EU project FeedNetBack. A preliminary conference version of the paper has appeared to ACM/IEEE IPSN 11.

2

I. I NTRODUCTION Wireless sensor networks, with their applications to smart grids, water distribution, vehicular networks, are networked systems in which decision variables must be quickly optimized by algorithms of light computational cost. Unfortunately, many traditional optimization methods are difficult to use in wireless sensor networks due to the complex operations or the high number of messages that have to be exchanged among nodes to compute the optimal solution [1]. Wireless sensor networks are characterized by small and cheap hardware platforms. Thus they have limited computational capabilities. It follows that there is the need of fast, simple, and robust to errors and noises computations to solve optimization problems, both in a centralized and in a distributed set-up [2] – [5]. Parallel and distributed computation is the basis for distributed optimization [6]. This is a topic that has recently attracted much attention [3], [7] – [11]. However, distributed optimization algorithms from the literature above are often based on decomposition and parallelization methods or on intermediate consensus iterations, where an iterate-and-broadcast procedure injects huge amounts of control messages over the network. The amount of these messages usually overcomes by orders of magnitude the number of source data messages. Moreover, decomposition and parallelization methods suffer from slow convergence [7]. While this may be fine for networks of processors or of wires, this is unacceptable for wireless networks, where transmitting information demands about one hundred times the energy needed to perform computations [3], and the convergence speed is further exacerbated by the presence of message losses. In this paper, we propose a new optimization approach. We define a new class of problems that is characterized by the availability of fast and simple algorithms for the computation of the optimal solution. We denote it F-Lipschitz to evidence that it is fast, hence “F”, and based on a Lipschitz property of the constraints. It is assumed that the objective function is increasing, whereas the constraints be transformed into contractive Lipschitz functions. To compute the optimal solution to centralized optimization problems, F-Lipschitz algorithms do not require Lagrangian methods, but superlinear iterations based on a solution of a system of equations given by the constraints, as illustrated in Fig. 1. We show that F-Lipschitz optimization solves much more efficiently problems traditionally solved by Lagrangian methods. In distributed optimization, we propose an algorithm that do not require Lagrangian decomposition and parallelization methods, but simple asynchronous iterative methods. We show that the computation of the optimal solution of an F-Lipschitz problem is robust to quantization errors and not sensitive

3

Inequality constraints acve at the opmum?

yes

Compute the soluon by F-Lipschitz algorithms

Fig. 1.

no

Compute the soluon by Lagrangian methods

F-Lipschitz optimization defines a class of optimization problems for which all the constraints are satisfied at equality

when computed at the optimal solution, including the inequality constraints. The solution to the set of equations given by the projected constraints is the optimal solution, which avoids using Lagrangian methods. These methods are computationally much more expensive, particularly for distributed optimization over wireless sensor networks. The challenging part of F-Lipschitz optimization is the availability of conditions ensuring that the constraints are active at the optimum without knowing what is the optimal solution in advance. For this reason, we restrict ourselves to problems in the form (III.1) presented in Section III. However, the method can be used for problems in the more general form (IV.1), as we see in Section IV.

to perturbation of the constraints. The approach presented in this paper covers several problems in the general interference function theory [12]–[14]. Such a theory is based on the assumption that the constraints of the radio power optimization problem have a special form, commonly referred to as “interference function”. Fig. 2 illustrates the intersection between the F-Lipschitz optimization and other classic optimization areas. The remainder of the paper is organized as follows: In Section II, two wireless sensor networks motivating examples are presented. The definition of F-Lipschitz optimization is given in Section III, along with properties and algorithms to compute the optimal solution. In Section IV, we show how to convert canonical optimization problems into the F-Lipschitz form. In Section V we show that the motivating examples of Section II are F-Lipschitz and we illustrate how they are solved. Finally, conclusions and future perspectives are given in Section VI. A. Notation We use the notation Rn+ to denote the set of strictly positive valued real vectors. By ·T we

denote the transpose of a vector or of a matrix. By | · | we denote the absolute value of a real

number. For x ∈ Rn we let

kxk1 =

n X k=1

|xk |

n

and kxk∞ = max |xk | . k=1

4

For a matrix A ∈ Rn×n we use the norm k · k1 and the norm k · k∞ defined as n

kAk1 = max j=1

n X i=1

n

|aij | and kAk∞ = max i=1

n X j=1

|aij |.

The spectral radius of a matrix is defined as ρ(A) = max{|λ| : λ ∈ eig(A)}, where eig(A)

denotes the set of eigenvalues of A.

By a  b and a  b we denote the element-wise inequalities between the vectors a and b. A

matrix is called positive, A  0, if aij ≥ 0, ∀i, j. By I and 1 we denote the identity matrix and

the vector (1, . . . , 1)T , respectively, whose dimensions are clear from the context. Given the set

D = [x1,min , x1,max ] × [x2,min , x2,max ] . . . [xn,min , xn,max ] ∈ Rn , with −∞ < xi,min < xi,max < ∞,

for i = 1, . . . , n, and the vector x ∈ D, we use the notation [xi ]D to denote the orthogonal

projection with respect to the Euclidean norm of the i-th component of the vector x onto the

i-th component of the closed set D, namely [xi ]D = xi if xi ∈ [xi,min , xi,max ], [xi ]D = xi,min if xi < xi,min , or [xi ]D = xi,max if xi > xi,max .

∇ denotes the gradient operator. Given a scalar function f (x) : Rn → R, T  df (x) df (x) ∇f (x) = . ,..., dx1 dxn

Given a vector function F(x) : Rn → Rn , we use the gradient matrix definition h i ∇F(x) = ∇F1 (x) . . . ∇Fn (x) ,

which is the transpose of the Jacobian matrix.

When we study multi-objective or vector optimization problems, we consider always Pareto optimal solutions. Therefore, we use the notation “optimal solution” to mean “Pareto optimal solution” (see below in Section III the definition). A point x is feasible to an optimization problem if it satisfies all the constraints of the problem. II. M OTIVATING E XAMPLES

In this section we describe some motivating examples where there is a need of fast optimization for wireless sensor networks. We argue that these problems cannot be solved efficiently with traditional approaches.

5 Non-Convex Optimization

Convex Optimization F-Lipschitz Optimization

Interference Function Optimization

Fig. 2.

Geometric Programming

F-Lipschitz optimization theory includes the interference function optimization of type-I [12] for smooth functions,

and, for example, part of convex optimization and geometric programming [15]. It follows that optimization problems previously solved by, e.g., convex solvers, that are F-Lipschitz can now be solved much more efficiently by the methods proposed in this paper.

A. Distributed Conditions for the Computation of a Norm

Consider a wireless sensor network of n nodes that communicate through some channel with possible loss of messages. In the areas of distributed consensus, distributed estimation and data fusion (see, e.g., [16] – [19] and references therein), it is common that this network of nodes needs to compute a stable matrix K = [ki ] with respect to some norm, where each row ki , i = 1 . . . , n, of the matrix belongs to a node [20]. Nodes could impose that kki kmax < 1, so

that kKkmax < 1 and the matrix is stable. However, it can be shown that this choice gives bad performance [18]. As a result, to ensure that a matrix has a bounded norm by local conditions is a challenging task. To overcome this problem, the following result was derived: Proposition 2.1 ([18]): Let K = [ki ] ∈ Rn×n , where ki ∈ R1×n . Let 0 < γmax < 1. Suppose

there exists a vector x = (x1 , x2 , . . . , xn )T ≻ 0 , such that √ X√ xj ≤ γmax , xi + xi

(II.1)

j∈Θi

for all i = 1, . . . , n, where the set Θi is the collection of communicating nodes located at two-hops distance from the node i, plus the neighbors of i. If kki k2 ≤ xi , i = 1, . . . , n, then

kKk2 ≤ γmax . 

This proposition suggests that, given some thresholds xi > 0 satisfying a set of non-linear inequalities, then as long as the norms of the rows ki are not above the thresholds xi , for i = 1, . . . , n, the matrix K is stable. Obviously, the condition on kki k2 ≤ xi leaves much

freedom in choosing the single elements in the vector ki , but the smaller xi the smaller these

6

elements will be. In order not to penalize any node, it is useful to solve the optimization problem

max 1T x

(II.2a)

x

s.t.

xi +



xi

X√

j∈Θi

xj ≤ γmax

i = 1, . . . , n

(II.2b)

x ∈ Rn+ . Such a problem is a non-convex vector optimization problem. No closed form solution is available. One would have to show that strong duality holds for the associated scalarized problem and then use the Karush-Kuhn-Tucker (KKT) conditions [21]. In a distributed set-up, the solution might be computed via standard decomposition methods [6]. The possible presence of message losses may introduce large delays as information associated to primal and dual variables must be transmitted among nodes, as it is usually done by these Lagrangian methods. An alternative method is needed, which is the F-Lipschitz optimization that we introduce in Section III. It is then possible to find algorithms to compute the solution much faster than Lagrangian methods both in a centralized and distributed set-up. B. Radio Power Allocation with Intermodulation Powers

In wireless systems, the problem of allocating the transmit radio powers is cast as an optimization problem. The radio power used to transmit signals must be minimized to reduce the nodes energy consumption and the interference caused to other wireless transmissions. At the same time, the radio powers should be high enough to allow the receivers detecting successfully the transmitted signals. For illustrative purposes, we consider the basic problem of radio power transmission that is of interest for sensor networks. However, more advanced radio problems can be investigated, as those listed in [12]. We consider a high data rate wireless sensor network of n transmitter nodes, where node i transmits at a radio power pi , i = 1, . . . , n. Since the data rate is high, it is meaningful to make a radio power control to save energy and prolong the lifetime of the sensor nodes. Let p ∈ Rn the vector that contains the radio powers. In the network, there

are n receiver nodes, where node i receives the power Gii pi of the signal from transmitter i. Gii

is the channel gain from transmitter node i to receiver node i. Receiver node i is subject also P to an interference from the signals from other transmitters, which is k6=i Gik pk and from the

7

thermal noise σi . The signal to interference plus noise ratio of the i-th transmitter-receiver pair is defined as [15] SINRi =

σi +

Gii pi P , 2 2 k6=i Gik pk + k6=i Mik pi pk

P

where Gik models the wireless channel gain between the transmitter i and the receiver j, and Mik is a intermodulation terms introduced by the amplifier of the receiver. Typically, these intermodulation terms are present when the amplifiers are built out of cheap and unreliable components, which may be typical for wireless sensor nodes. Mik , k 6= i, assumes values smaller then Gik , and can take on both positive and negative values.

The power optimization problem is usually written in the following form min p

(II.3)

p

s.t. SINRi ≥ Smin ,

i = 1, . . . , n ,

pmin 1  p  pmax 1 , where Smin is the minimum required SINR and ensures that the signal of transmitter i can be received with the desired quality. The box constraints on the radio powers is naturally due to that transmitters have a minimum and maximum power level that can be used. Notice that the solution to this problem must be achieved in a distributed fashion, namely every node must be able to compute its own radio power because there is no time for node i to send the wireless channel coefficients Gij , which is measured at node i, to a central solver and wait back for the optimal power to use: in the meantime Gij would have changed due to the dynamic of the wireless channel and the optimal solution would be outdated [22]. Problem (II.3) is an unsolved optimization problem in communication theory. The interference function theory [12], [13], which is based on the monotonic and scalable property of the interference, is the fundamental reference to solve these problems. However, it cannot be used here because the intermodulation coefficients can be negative and this makes the interference term not monotonic and scalable. By the same argument, the geometric programming theory, which has also been widely employed to solve power allocation problems [23], cannot be used. Moreover, the problem is not convex. The classical approach would be to use parallelization and decomposition methods [6], provided that strong duality holds. Then, one would use iterative computation of the primal decision variables and dual variables until convergence is achieved. This makes it hard computing the solution of the problem (II.3), particularly when such a solution has to be computed by nodes

8

of reduced computational capability. An alternative theory is needed. This theory is developed in the following. III. F-L IPSCHITZ O PTIMIZATION

In this section we give the definition of an F-Lipschitz optimization problem, we characterize the existence and uniqueness of the optimal solution, we give algorithms to compute such a solution both in a closed form, when possible, and numerically. We characterize several features of the new optimization, including sensitivity analysis and robustness to quantization. Definition 3.1 (F-Lipschitz optimization): An F-Lipschitz optimization problem is defined as max f0 (x)

(III.1a)

x

s.t. xi ≤ fi (x) ,

i = 1, . . . , l

(III.1b)

i = l + 1, . . . , n

(III.1c)

xi = hi (x) , x∈D,

(III.1d)

where D ⊂ Rn is a non empty, convex, and compact set, l ≤ n, with objective and constraints being continuous differentiable functions such that f0 (x) : D → Rm ,

m≥1

fi (x) : D → R ,

i = 1, . . . , l

hi (x) : D → R ,

i = l + 1, . . . , n

Let f (x) = [f1 (x), f2 (x), . . . , fl (x)]T , h(x) = [hl+1 (x), hl+2 (x), . . . , hn (x)]T , and let " # f (x) F(x) = [Fi (x)] = , h(x) with Fi (x) : D → R. The meaning of max ins the optimization is in the Pareto optimal sense,

9

which we specify below in Definition 3.2. The following properties must be verified: 1.a ∇f0 (x) ≻ 0 , i.e., f0 (x) is strictly increasing,

(III.2a)

1.b

(III.2b)

k∇F(x)k < 1 for some norm k · k ,

and either 2.a ∇j Fi (x) ≥ 0 ∀i, j ,

(III.2c)

or c ∈ R+ ,

3.a f0 (x) = c1T x,

(III.2d)

3.b

∇j Fi (x) ≤ 0 ∀i, j ,

(III.2e)

3.c

k∇F(x)k∞ < 1 ,

(III.2f)

or 4.a f0 (x) ∈ R , 4.b

k∇F(x)k∞
0, i = 1, . . . , l and µi ∈ R, i = l + 1, . . . , n. We let G(x) = [g1 (x), . . . , gl (x), pl+1 (x), . . . , pn (x)]T . Problem (IV.1) and problem (IV.2) have the same optimal solution because the constraints of problem (IV.2) hold if and only if the constraints of problem (IV.1) hold, since γi > 0 and µi 6= 0 ∀i.

Clearly, problem (IV.2) is in general not an F-Lipschitz one. It is interesting to establish when

problem (IV.2) is F-Lipschitz, namely when qualifying properties (III.2a) – (III.2h) are satisfied for (IV.2). We have the following result: Theorem 4.1: Consider the optimization problems (IV.1) and (IV.2). Suppose that ∀ x ∈ D 1.a ∇g0 (x) ≺ 0 ,

(IV.6a)

1.b

(IV.6b)

∇i Gi (x) > 0 ∀i ,

and either 2.a ∇j Gi (x) ≤ 0 ∀j 6= i , X 2.b ∇i Gi (x) > |∇i Gj (x)| ∀i ,

(IV.6c) (IV.6d)

j6=i

or 3.a g0 (x) = −c1T x c ∈ R+ ,

(IV.6e)

3.b

(IV.6f)

3.c

∇j Gi (x) ≥ 0 ∀j 6= i , X |∇i Gj (x)| ∀i , ∇i Gi (x) >

(IV.6g)

j6=i

or 4.a g0 (x) ∈ R ,

(IV.6h)

4.b

(IV.6i)

X δ ∇i Gi (x) > |∇i Gj (x)| ∀i , δ+∆ j6=i

where δ and ∆ are defined in Eqs. (III.2i) and (III.2j). Then, problem (IV.2) is F-Lipschitz. Proof: We show that by the conditions of the theorem, the F-Lipschitz properties are satisfied for the optimization problem (IV.2). Condition (IV.6a) imply that condition (III.2a) of an F-Lipschitz optimization problem is verified. Condition (IV.6a) allows to use the transformations (IV.4) and (IV.5).

20

Condition (IV.6c) implies (III.2c), whereas condition (IV.6d) implies (III.2b), namely that there is some norm for which the F-Lipschitz problem associated to the canonical has contractive constraints. Conditions (IV.6e) and (IV.6f) implies condition (III.2d) and (III.2e), whereas condition (IV.6f) gives (III.2f). Indeed, such a condition asks that X |1 − γi ∇i gi (x)| + |γj ∇i gj (x)| < 1 .

(IV.7)

i6=j

If we let γi = γ > 0 ∀i and γ small enough so that 1 − γ∇i gi (x) ≥ 0, than the preceding inequality is satisfied if

γi (−∇i gi (x) +

X i6=j

|∇i gj (x)|) < 0 ,

which is ensured by condition (IV.6f). Condition (IV.6h) implies (III.2g), whereas condition (IV.6i) implies (III.2h). We can see this by that such a condition asks |1 − γi ∇i gi (x)| + If we let γi = γ > 0 ∀i and

X i6=j

|γj ∇i gj (x)|
1 − i6=j

δ . δ+∆

δ , δ+∆

which asks δ 1 − δ+∆ P γ > γ = max . x∈D ∇i gi (x) − i6=j |∇i gj (x)|

If γ¯ > γ, then the function F(x) built from Gx is contractive with respect to the norm infinity, which is ensured by condition (IV.6i). Therefore, by the conditions of the theorem, problem (IV.2) satisfies the F-Lipschitz qualifying properties. This concludes the proof. It is possible to use this theorem to show when convex optimization problems can be cast to F-Lipschitz and thus solved much more efficiently. By the same theorem, it is also possible to show that, for example, a class of geometric and signomial programming problems [15] is solved by an F-Lipschitz approach.

21

A. Convex Optimization In this subsection, we illustrate by examples that some convex optimization problems can be converted into F-Lipschitz problems and solved by the quick methods described in Section III-B. Recall that the traditional Lagrangian methods to compute the optimal solution, namely the interior point methods for convex problems, requires either more iterations, or more tedious algebraic steps than the method offered by our F-Lipschitz optimization. 1) Example 1: Consider the following optimization problem: min ae−x + be−y x,y

s.t.

(IV.8a)

x − 0.5y − 1 ≤ 0

(IV.8b)

− 0.5x + 2y ≤ 0

(IV.8c)

x ≥ 0,

y ≥ 0,

where a > 0, b > 0, and x, y ∈ R. The problem is convex, since so does the objective function

and the constraints. Slater condition apply, so the problem is feasible and admits an optimal solution [1]. However, there is no need to use the KKT conditions to compute the solution. Theorem 4.1 applies, since the objective function of (IV.8) is decreasing. For the first constraint we have ∇x (x − 0.5y − 1) = 1 > 0 and ∇y (x − 0.5y − 1) = −0.5 < 0. For the second

constraint, we have ∇x (−0.5x + 2y) = −1 < 0 and ∇y (−0.5x + 2y) = 2 > 0. It follows that

conditions (IV.6b) and (IV.6c) hold. Moreover,

∇x (x − 0.5y − 1) = 1 > |∇x (−0.5x + 2y)| = 0.5 , ∇y (−0.5x + 2y) = 2 > |∇y (x − 0.5y − 1)| = 0.5 , which means that the first alternative of condition (IV.6d) hold. Therefore, we conclude that Theorem 4.1 applies and the optimal solution to problem (IV.8) is given by the system of equations of constraints at the equality, namely x − 0.5y − 1 = 0 − 0.5x + 2y = 0 , which gives x∗ = 8/7 and y ∗ = 2/7. Same result could have been achieved by solving the equations given by the KKT conditions. This would have required much more algebraic passages that the ones we used by the F-Lipschitz optimization.

22

2) Example 2: In this section we show that convex optimization problems may have a hidden F-Lipschitz nature. By doing a simple change of variables, we show that the original convex problem can be converted into an F-Lipschitz one. Consider the following convex optimization problem: min ae−x + be−y + cez

(IV.9a)

x,y,z

s.t. 2x − 0.5y + z + 3 ≤ 0

(IV.9b)

− x + 2y − z −1 + 1 ≤ 0

(IV.9c)

− 3x − y + z −2 + 2 ≤ 0

(IV.9d)

xmin ≤ x ≤ xmax ,

ymin ≤ y ≤ ymax ,

zmin ≤ z ≤ zmax ,

where a > 0, b > 0, and x, y, z ∈ R and we assume that zmax ≤ 0.5. Theorem 4.1 does not apply here, because, for instance ∇z (x − 0.5y + z + 3) > 0 (condition (IV.6c) does not hold). However, we can make the following variable substitution t = z −1 , and we obtain max x,y,t

s.t.

− ae−x − be−y − ce−t

(IV.10a)

2x − 0.5y + t−1 + 3 ≤ 0

(IV.10b)

− 0.5x + 2y − t + 1 ≤ 0

(IV.10c)

− 0.5x − y + t2 + 2 ≤ 0

(IV.10d)

xmin ≤ x ≤ xmax ,

ymin ≤ y ≤ ymax ,

1/zmax ≤ t ≤ 1/zmin .

Clearly, the optimal solution to problem (IV.10), x∗ , y ∗ , t∗ is the same of problem (IV.9), with z ∗ = 1/t∗ . It is easy to see that conditions

(IV.6) and condition (IV.6c) hold, therefore

Theorem 4.1 applies and problem (IV.10) is an F-Lipschitz optimization problem, the solution of which is achieved by solving the system of equations given by all the constraints at the equality. B. Geometric Programming The Geometric programming theory is an area of mathematical optimization that has been rediscovered in the recent years [15]. It has found numerous application particularly in wireless networks and electronic circuit design. The essential feature of a geometric program is that it can be mechanically converted into a convex optimization problem, for which there are interior point methods to compute the optimal solution. However, we show that there is vast class of

23

geometric programming problems that can be solved much faster and cheaper by the F-Lipschitz optimization. By this result, there is no need for a geometric optimization to be transformed into a convex optimization problem and then solved by interior point methods. This has obvious advantages for the computation of the optimal solution. This has also the advantage that the computation of the optimal solution is easy to distribute, as it is required in may systems such as wireless sensor networks. A geometric optimization problem has the following form min g0 (x)

(IV.11a)

x

s.t. gi (x) ≤ 1 i = 1, . . . , l

(IV.11b)

pi (x) = 1 i = l + 1, . . . , m

(IV.11c)

x∈D

(IV.11d)

x≻0

where the objective function (IV.11a) and the inequality constraints (IV.11b) are posynomials and the equality constraint are monomials [15]. More specifically, xi ∈ R, ∀i, and the posynomial constraints are gi (x) : D → R with gi (x) =

K X k=1

cik x1ai1k x2ai2k · · · xamimk

i = 1, . . . , l ,

whereas the monomial constraints are pi (x) : D → R with pi (x) = ci x1bi1 x2bi2 · · · xbmim

i = l + 1, . . . , m ,

where cik > 0, aijk ∈ R, bij ∈ R, ∀i, j, k.

Clearly, a geometric optimization problem can always be easily rewritten in the form (IV.1),

so that we can check if it is an F-Lipschitz problem by using Theorem 4.1. However, given the special structure of a geometric problem, there are some sufficient easy-to-check conditions telling if it is F-Lipschitz, as we show in the following corollary that applies in some special case: Corollary 4.2: Consider the geometric optimization problem (IV.11). Let A = {x ∈ D|gi (x) ≤ 1, i = 1, . . . , l, pi (x) = 1} . Problem (IV.11) is an F-Lipschitz problem if the following conditions simultaneously hold: 1) ∇g0 (x) ≺ 0 ∀x ∈ A ;

2) aiik > 0 and bii > 0 ∀i and aijk ≤ 0 bij ≤ 0 for i 6= j;

24

3) ∇i Gi (x) >

P

j6=i

|∇i Gj (x)| ∀i

4) D = [x1,min , x1,max ] × [x2,min , x2,max ] × . . . × [xn,min , xn,max ], with 0 < xi,min < xi,max < ∞ ∀i.

Proof: The first condition of the corollary ensures that problem can be converted into a maximization problem with an increasing objective function. The second condition ensures that the diagonal elements the Jacobian of the vector function given by the left-hand side of the constraints are positive and the off diagonal are always negative, since x ≻ 0. The third

condition of the corollary ensures that it is possible to transform the constraints into contractive functions. Thus, Theorem 4.1 holds and this concludes the proof. Remark 4.3: Corollary 4.2 gives only sufficient conditions. If a geometric programming

problem does not satisfy the conditions of Corollary 4.2, it is still possible that it is F-Lipschitz if the conditions of Theorem 4.1 are satisfied. The reason for the corollary to be sufficient is that only some of the conditions in Theorem 4.1 are covered by this corollary. The previous corollary says that if 1) the objective function of a geometric programming problem is decreasing, 2) the constraint i ∀i has the variable xi with positive exponent, whereas the other variables of the constraint have always negative exponent, 3) the Jacobian is diagonally strictly

dominant per row and 4) there is a box condition on the decision variables, than the problem is F-Lipschitz. V. E XAMPLES OF A PPLICATIONS In this section, we show that the wireless sensor network motivating examples of Section II are F-Lipschitz. We also show examples of convex problems that can be solved by F-Lipschitz optimization. A. Computation of a Norm Here we illustrate the F-Lipschitz optimization by solving the problem of the wireless sensor networks example of Section II-A. Observe that from the constraints of problem (II.2), it follows immediately that 0 ≺ x ≺ 1. Observe that the qualifying conditions (IV.6e), (IV.6f) and (IV.6g) √ P √ hold of Theorem 4.1 hold. In particular, we define where Gi (x) = xi + xi j∈Θi xj − γmax

and we have

1 X√ xj ∇i Gi (x) = 1 + √ 2 xi j∈Θ i

25

0.8 0.7 0.6

x

0.5 0.4 0.3 0.2 0.1 0 1

2

3

4

5

6

7

8

9

10

Iterations Fig. 4.

Qualitative convergence behavior of the iterations (III.4) for the wireless sensor network optimization problem (II.2)

of Section II-A with n = 10 nodes and Θi = 5. The convergence is reached quickly, in this case with less than 6 iterations. The iterations are initialized with 1/n.

and 1 √ ∇i Gj (x) = √ xj 2 xi

if j ∈ Θi

and ∇i Gj (x) = 0 if j ∈ / Θi . It follows that condition (IV.6g) holds. Therefore, Theorem 4.1

applies and problem (II.2) has the optimal solution satisfying all the constraints at the equality.

The solution could be computed by Algorithm (III.3) for a centralized solution, or by Algorithm (III.4) in a distributed set-up. In both cases, we have to transform the problem to the form (III.1) by the positive scalars γi : Fi (x) = xi − γi

xi +



xi

X√

j∈Θi

xj − γmax

!

,

i = 1, . . . , n ,

(V.1)

This function is contractive Lipschitz provided that one chooses γi ≤ min

0≺x≺1

1 , ∇i gi (x)

See the proof of Theorem 4.1 for the detail. Now, we present some numerical examples. We consider a network composed of n = 10 nodes, with an average number of Θi = 5 neighbors per node. An example of the distributed computation as obtained by the algorithm (III.4) with the functions fi (x) given by (V.1) is reported in Fig. 4. The algorithm converges quickly. Specifically, we assumed that convergence is reached when |xi (k + 1) − xi (k)| ≤ ǫ, with ǫ being

the desired accuracy. From Monte Carlo simulations, we observed that the algorithm converges in about 5 − 10 iterations on average by setting ǫ = 10−8 , which is a quite small value if compared to the order of magnitude of the optimal solution (10−2 ). We solved problem (II.2)

26

also by traditional Lagrangian methods, which use iterations similar to (III.5) and

(III.6).

The method is commonly implemented by the Matlab function fmincon. We noticed that convergence is achieved by approximately 30 iterations and by twice the computations needed for algorithm (III.3). Therefore, the computational and communication efficiency of our new method is remarkable. B. Radio Power Allocation The power allocation problem of Section II-B is F-Lipschitz. We can see this by making the variable substitution xi = −pi . Then, problem (II.3) can be rewritten as max x

(V.2)

x

s.t.

Gii xi + Smin (σi −

X

Gik xk +

k6=i

X k6=i

Mik x2i x2k ) ≤ 0,

i = 1, . . . , n , − pmax 1  x  −pmin 1 . Since Mik is smaller than Gii and Gik ∀i, k, it is reasonable to assume that for −1pmax  P x  −1pmin the following conditions hold: Gii > 2Smin k6=i Mik xi x2k , Gik > 2Mik x2i xk and X X Gii + 2Smin Mik p3min ≥ Smin Gik . k6=i

k6=i

This assumption gives ∇i gi (x) >

X i6=j

|∇j gi (x)| ,

for −1pmax  x  −1pmin . Therefore, Theorem 4.1 applies (see conditions Eq. (IV.6c) and Eq. (IV.6d)) and problem (V.2) has the optimal solution satisfying all the constraints at equality.

The solution could be centrally computed by algorithm (III.3), or by algorithm (III.4) in a distributed set-up. In this last case, we have to transform the problem in the form (III.1) by positive scalars γi , as shown in Section IV, thus achieving the equivalent constraints fi (x) = xi − γi gi (x), where gi (x) = Gii xi + Smin (σi −

X k6=i

Gik xk +

X k6=i

Mik x2i x2k ) ∀i .

27

This function is contractive Lipschitz provided that one chooses γi ≤ = =

min

−1pmax x−1pmin

min

1 ∇i gi (x)

−1pmax x−1pmin

Gii + 2Smin

1 P Gii + 2Smin p3max k6=i Mik

1 P

k6=i

Mik xi x2k

which ensures contractivity with respect to the infinity norm. We performed numerical simulations by Matlab with 10 transmitter–receiver pairs of nodes. We considered a wireless sensor network where the figures for the wireless channel and noises are taken coherently with the Tmote Sky sensor nodes [29], which features the CC2420 radio transceiver module by Chipcon [30]. The noise is set to σi = −130 dBm ∀i, Smin = 1, ∀i,

Gij = −90 dBm ∀i 6= j and Mij = −120 dBm ∀i 6= j. Moreover, pmin = −25 dBm and

pmax = 0 dBm. See [31] for details. We observed a convergence of algorithm (III.4) in less than 10 iterations, whereas a traditional method based on the Lagrangian dual function converges in about 40 iterations. Once again, the reduction of number of iterations and computational complexity ensured by our method is remarkable. VI. C ONCLUSIONS AND F UTURE W ORK

In this paper we presented the F-Lipschitz optimization, which enables fast computations of the solution of a class of convex and non-convex problems. The central idea was to show that the optimal solution is achieved when all the constraints hold at equality. We showed that this optimization method can solve problems much more efficiently than traditional Lagrangian methods, including convex problems. If an F-Lipschitz optimization problem must be solved by distributed operations, then our optimization method uses simple and fast distributed asynchronous algorithms, which are very appealing for wireless sensor networks. We showed that some typical optimization problems that arise in wireless sensor networks are F-Lipschitz, such as the distributed computation of a norm and radio power control. We believe that F-Lipschitz optimization may have many developments. There can be more qualifying properties for which this optimization applies, such as when constraints have the Jacobian always positive definite (which is sufficiently ensured by the qualifying F-Lipschitz conditions). In many situations, it could be interesting to approximate problems to F-Lipschitz

28

ones, given the numerous useful properties of this optimization. For example, if the Jacobian of the constraints does not have all the off-diagonal elements of the same sign, then one could set to zero these outliers and see how suboptimal is the solution by following a similar approach proposed in [32]. Another interesting line of research is about the extension of our method to more general objective functions, e.g., non necessarily increasing functions. The introduction of slack variables that make the constraints hold at equality might be useful, even though it appears challenging to tie these variables to the objective function and show that the technique is more convenient than Lagrangian methods. For problems with general objective functions and constraints, it could be computationally useful to check the existence of conditions for which the constraints hold at equality. If we know that strong duality holds, then strictly positive Lagrangian multipliers associated to inequality constraints are a sufficient (but not necessary) condition for the solution to satisfy the constraints at equality. We are currently investigating when these conditions hold. Finally, we are currently investigating whether knowing that some of the inequality constraints hold at equality at the optimum is of help to reduce the computations for the optimal solution. This might be done by not including these constraints in the Lagrangian. R EFERENCES [1] D. P. Bertsekas, Non Linear Programming, 2nd ed. Athena Scientific, 2004. [2] M. Rabbat and R. Nowak, “Distributed optimization in sensor networks,” in ACM/IEEE IPSN, 2004. [3] H. Gharavi and P. R. Kumar, “Toward a theory of in-network computation in wireless sensor networks,” IEEE Communication Magazine, pp. 97–107, 2006. [4] P. Wan and M. D. Lemmon, “Event-triggered distributed optimization in sensor networks,” in ACM/IEEE IPSN, 2009. [5] H. Wang, N. Agoulmine, M. Ma, and J. Yanliang, “Network lifetime optimization in wireless sensor networks,” IEEE Journal on Selected Areas on Communication, vol. 28, no. 7, pp. 1127–1137, September 2010. [6] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Athena Scientific, 1997. [7] B. Johansson, “On distributed optimization in networked systems,” Ph.D. dissertation, KTH, 2009. [8] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Stanford University, Tech. Rep., January 2011. [9] K. Chowdhury and I. Akyildiz, “Cognitive wireless mesh networks with dynamic spectrum access,” IEEE Journal on Selected Areas on Communications, pp. 168 – 181, August 2008. [10] L. Xiaojun, N. Shroff, and R. Srikant, “A tutorial on cross-layer optimization in wireless networks,” IEEE Journal on Selected Areas on Communications, pp. 1452 – 1463, August 2006. [11] S. S. Ram, A. Nedi´c, and V. V. Veeravalli, “A new class of distributed optimization algorithms: Application to regression of distributed data,” Optimization Methods and Software, vol. 0, no. 0, pp. 1–19, January 2009. [12] R. D. Yates, “A framework for uplink power contorl in cellular radio system,” IEEE Journal on Selected Areas in Communications, vol. 13, pp. 1341–1347, September 1995. [13] J. Herdtner and E. K. P. Chong, “Analysis of a class of distributed asynchronous power control algorithm for cellular wireles system,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 3, pp. 436–446, March 2000.

29

[14] M. Schubert and H. Boche, QoS-Based Resource Allocation and Transceiver Optimization, Foundations and Trends in Communications and Information Theory.

NOW, 2005.

[15] S. Boyd, S. J. Kim, L. Vandenberghe, A. Hassibi, “A tutorial on geometric programming,” Optimization and Engineering, vol. 1, no. 1, p. 1, 1 2006. [16] A. Speranzon, C. Fischione, and K. H. Johansson, “Distributed and collaborative estimation over wireless sensor networks,” in IEEE Conference on Decision and Control, 2006. [17] R. Carli, F. Fagnani, A. Speranzon, and S. Zampieri, “Communication constraints in the average consensus problem,” Automatica, vol. 44, no. 3, March 2008. [18] A. Speranzon, C. Fischione, K. H. Johansson, and A. Sangiovanni-Vincentelli, “A distributed minimum variance estimator for sensor networks,” IEEE Journal on Selected Areas in Communications, vol. 26, pp. 609–621, May 2008. [19] L. Shi, “Resource optimization for networked estimator with guaranteed estimation quality,” Ph.D. dissertation, Caltech, 2008. [20] C. Fischione, A. Speranzon, K. H. Johansson, and A. Sangiovanni-Vincentelli, “Peer-to-peer estimation over wireless sensor networks via Lipschitz optimization,” in ACM/IEEE IPSN, 2009. [21] S. Boyd and L. Vandenberghe, Convex Optimization. [22] G. L. St¨uber, Priciples of Mobile Communication.

Cambridge University Press, 2004.

Kluwer Academic Publishers, 1996.

[23] S. Kandukuri, S. Boyd, “Optimal power control in interference-limited fading wireless channels with outage-probability specifications,” IEEE Transaction on Wireless Communications, Vol. 1, pp 46–55, January 2002. [24] J. Jhan, Vector Optimization.

Springer-Verlag, 2004.

[25] C. W. Sung and K. K. Leung, “A generalized framework for distributed power control in wireless networks,” IEEE Transactions on Information Theory, vol. 51, no. 7, pp. 2625–2635, July 2005. [26] C. Fischione, “F-lipschitz optimization,” KTH Royal Insitute of Technology, Tech. Rep. TRITA-EE 2011:042, 2009. [Online]. Available: http://www.kth.se/ees/forskning/publikationer?l=en UK [27] B. Polyak, Introduction to Optimization.

Optimization Software, Inc., 1987.

[28] R. A. Horn and C. A. Johnson, Matrix Analysis. Cambridge University press, 1985. [29] Tmote Sky Data Sheet, Moteiv, San Francisco, CA, 2006. [Online]. Available: http://www.moteiv.com/products/docs/tmotesky-datasheet.pdf [30] Chipcon Products, “2.4 GHz IEEE 802.15.4 ZigBee-ready RF Transceiver,” Texas Instruments, Tech. Rep., 2006. [31] C. Fischione, K. H. Johansson, A. Sangiovanni-Vincentelli, and B. Zurita Ares, “Energy consumption of minimum energy coding in cdma wireless sensor networks,” IEEE Transactions on Wireless Communications, vol. 8, pp. 985–994, February 2009. [32] A. Rantzer, “On prize mechanisms in linear quadratic team theory,” in IEEE Conference on Decision and Control, December 2007. [33] M. A. Hanson, “On sufficiency of the Kuhn-Tucker conditions,” in Journal of Mathematical Analysis and Applications, vol. 80, 1981, pp. 545–550. [34] R. Osuna-Gomez, A. Rufian-Lizana, and P. Ruiz-Canales, “Invex functions and generalized convexity in multiobjective programming,” in Journal of optimization theory and applications, September 1998, pp. 665 –661. [35] R. Horst, P. M. Pardalos, and N. V. Thoai, Introduction to Global Optimization, Nonconvex Optimization and its Applications. Kluwer Academic Publisher, 1995.

30

A PPENDIX A. Proof of Theorem 3.3 By the F-Lipschitz qualifying conditions, F(x) is a contractive function. It follows from that Leary-Schauder-Tychonoff fixed point theorem [6] that there exists a unique fixed point to x = F(x) in the compact set D. By our assumption, the fixed point x∗ is feasible to the F-Lipschitz optimization problem. It remains to show that the fixed point is optimal to problem (III.1). Suppose that t 6= x∗ is an arbitrary optimal point such that t + b = F(t) ,

(A.1)

for some nonzero b  0. We show next that x∗ gives always a better cost function compared to

t. Since F(x) is contractive Lipschitz, we can use the mean value theorem for every component of F(x), so that ∀i

fi (t) = fi (x∗ ) + ∇fi (zi )T (t − x∗ ) ,

where zi ∈ D and depends on i, x∗ and t. Collecting previous expressions in a vector notation

we have

F(t) = F(x∗ ) + A(t − x∗ ) = x∗ + A(t − x∗ ) , where A = [ai,j ] is a matrix whose row i fi (zi )T . A is clearly related to the Jacobian of F(x). Note that we used that F(x∗ ) = x∗ . Eq. (A.1) can be rewritten as

y = Ay + b ,

(A.2)

with y = x∗ − t. For any scalarization µT f0 (x), where µ ≻ 0, the change in the cost function

from t to x∗ is

µT f0 (x∗ ) − µT f0 (t) = cT (x∗ − t) = cT y where c = [ci ], ci > 0, and we used the mean value theorem since the cost function is continuous with bounded derivative. We want to show that cT y > 0. We consider every case of the qualifying conditions. When (III.2c) and (III.2b) hold we have aij ≥ 0 and kAk < 1. Then (I − A) is invertible

because ρ(A) ≤ kAk < 1 and (I − A)−1 = I + A + A2 + . . . ≻ 0 (positive matrix) with strictly

positive diagonal elements. It follows that cT (I−A)−1 ≻ 0 and hence cT y = cT (I−A)−1 b > 0.

31

Pn

In case when (III.2e) and (III.2f) hold we have aij ≤ 0,

i=1

aij < 1 (or kAk1 < 1) and

c = c1, for some c > 0. We need to show that cT y > 0, where y = Ay + b. Since kAk1 < 1 it

follows that ρ(A) < 1 and I−A is invertible so our desired condition is 1T (I−A)−1 b = bT (I− AT )−1 1 > 0. This is equivalent to show that bT v > 0 for v = AT v + 1 = A2T v + (AT + I)1. Since A2T  0 and ρ(A2T ) ≤ kA2 k1 < 1 it follows that (I − A2T )−1  0 with strictly positive

diagonal elements. Hence,

bT v = bT (I − A2T )−1 (AT + I)1 > 0, since (AT + I)1 ≻ 0.

Suppose that the qualifying condition (III.2h) holds. Once again, we would like to show that

cT y = cT (I − A)−1 b = bT (I − AT )−1 c := bT v > 0, where we defined v = (I − AT )−1 c, which means that

(I − AT )v = c .

(A.3)

Clearly, if v ≻ 0 then cT y > 0. From Eq. (A.3), we have vi =

n X

aj,i vj + ci ,

j=1

∀i .

(A.4)

It follows that vmax ≤

n X j=1

where vmax = maxi |vi |. Thus vmax ≤ max i

|aj,i |vmax + ci ,

n X j=1

∀i ,

|aj,i |vmax + max ci , i

whereby vmax ≤

maxi ci P 1 − maxi nj=1 |aj,i |

∀i .

(A.5)

The denominator of the previous inequality is always positive by the assumption that k∇F (x)k∞
0, we have that λmin is positive if the numerator and the denominator of the previous inequality are positive, which happens if n X max |aj,i | < 1 , i

j=1

and

max i

n X j=1

|aj,i |
0, where ei is the

all zero vector with the exception of the i-th element, which is 1. It follows that qi (x) − qi (y) ≥ −1T

(1 − αi )∇qi (x) kx − yk , 1T ∇qi (x)

whereby the proposition follows by observing that ηi (x, y) = − Lemma A.3: Let f (x) =

P

j

(1 − αi )1 kx − yk . 1T ∇qi (x)

(A.7)

µj f0j (x) the scalar objective function associated to the vector

optimization problem (III.1), where 0 < µj ≤ 1 ∀ j. Let Lj be the modulus of f0j (x) ∀ j. Then,

f (x) is invex with respect to the function η0 (x, y) : D × D → Rn , with P j µ j Lj 1 kx − yk . η0 (x, y) = − T 1 ∇f (x)

(A.8)

Proof: The proof follows the same steps of Lemma A.2.

We are now in the position to prove Theorem 3.8. Proof: Strong duality follows by showing that the KKT conditions are necessary and sufficient. The necessary condition is given by the Mangasarian-Fromowitz constraint qualification [35, pag. 25]. For this constraint qualification to hold, it is necessary that the gradient of the inequality constraints is a full rank matrix, which we know from Lemma 3.5. Finally, the sufficiency of the KKT conditions is ensured by showing that the objective function and the constraints have a common invex function, see [33]: From Lemma A.2 and Lemma A.3, we define the common invex function η(x, y) : D × D → Rn such that η(x, y) = min ηi (x, y) i

i = 0, . . . , n ,

where η0 (x, y) is given in (A.8) and ηi (x, y) is given in (A.7). Note that −∞ < ηi (x, y) < 0

for i = 1, . . . , n, and that −∞ < η0 (x, y) < 0 because 1T ∇f (x) > 0 ∀x ∈ D. C. Proof of Claim 3.9

Since strong duality holds from Theorem 3.8, we use the Lagrange dual problem to establish the sensitivity. Consider the scalarized problem associated to (III.1), where the vector objective function is converted into a scalar function by the Pareto coefficients µ = [µi ], with µ ≻ 0, and

34

with 1T µ = 1 [21]. The scalarized objective function of problem (III.1) is f0 (x) = [f0i (x)] ∈

Rm , with f0i (x) : D → R, i = 1, . . . , n, we have: max x

s.t.

θ µT f0 (x) maxx∈D ∇µT f0 (x)

(A.9a)

xi ≤ Fi (x)

(A.9b)

i = 1, . . . , n

x∈D. Notice that we have normalized the scalarized objective function of problem (III.1) by an arbitrary constant given by θ > 0 divided by the maximum of the derivative. Clearly, this does not affect the optimal solution as long as θ > 0. Let λ = [λi ] ∈ Rn be the vector of Lagrangian multipliers.

We show next that 0 < λ < 1 provided that θ is small enough, which ensures a low sensitivity of the optimization problem to perturbations of the constraints [1]. The Lagrange multipliers associated to this problem must satisfy the following condition coming from the derivative of the Lagrange dual function: (I − ∇F(x))λ =

θ∇µT f0 (x) , maxx∈D ∇µT f0 (x)

(A.10)

where F(x) = [F1 (x), F2 (x), . . . , Fn (x)]T . For simplicity of notation, let c(x) =

θ ∇µT f0 (x) . maxx∈D ∇µT f0 (x)

Clearly, 0 < ci < θ, ∀i. Then, it trivially follows that (A.10) gives λ = ∇F(x∗ )λ + c(x∗ ) ,

at the optimal point x∗ . We observe that this equation can be immediately arranged in a form identical to Eq. (A.3), where ∇F(x∗ ) pays the same role as A and c(x∗ ) as c. Hence, we can use exactly the same arguments whereby v ≻ 0 to prove that λ ≻ 0 ∀θ > 0. In addition, we can use θ to decrease the upper bound on the Lagrangians (which is the same as the upper bound on v given by Eq.(A.5)) and achieve that λ ≺ 1. This concludes the proof.

Suggest Documents