Extensions of Fast-Lipschitz Optimization

1

Extensions of Fast-Lipschitz Optimization

arXiv:1309.0462v1 [math.OC] 2 Sep 2013

Martin Jakobsson, Carlo Fischione, and Pradeep Chathuranga Weeraddana

Abstract—The need of fast distributed solvers for optimization problems in networked systems has motivated the recent development of the Fast-Lipschitz optimization framework. In such an optimization, problems satisfying certain qualifying conditions, such as monotonic cost functions and contractive constraints, have an unique optimal solution obtained via fast distributed algorithms that compute the fixed point of the constraints. This paper extends the set of problems for which the Fast-Lipschitz framework applies. Existing assumptions on the problem form are relaxed and new and generalized qualifying conditions are established by novel results based on Lagrangian duality. It is shown for which cases of more constraints than decision variables, and less constraints than decision variables FastLipschitz optimization applies. New results are obtained by imposing non strict monotonicity of the const functions. Finally, the extended Fast-Lipschitz framework is applied to non convex optimization problems, such as an opportunistic uplink radio power control problem in wireless communication systems, where the previously unknown optimality of existing iterative methods is now established.

I. I NTRODUCTION In networked optimization problems, the network nodes or agents must coordinate their actions to optimize a networkwide objective function. When information such as nodes’ objectives, constraints and decision variables are distributed among the nodes, physically scattered across the network, or if the amount of information is too large to collect centrally, it might be unpractical or even impossible to compute the solution at a central location. For example, collecting information in one place might be too expensive if the network has limited communication resources, or it may be too slow if the solution is needed at the local nodes in real time. In these situations, fast distributed solution algorithms must be used. Distributed optimization has a long history, and much of the recent developments build upon the work of Tsitsiklis [2], [3]. Several approaches exist for solving these problems. In primal or dual decomposition methods, the primal or the dual problem is decomposed into local subproblems solved at the nodes. The subproblems are coordinated through a centralized master problem, which is usually solved by gradient or subgradient methods [4]. These methods have found many applications in network utility maximization, e.g., [5]–[7]. Due to the slow convergence of subgradient algorithms, recent works have explored higher order methods. For example, [8] replaces the subgradient step with a Newton-like step. An other decomposition approach which is is faster and more robust than the standard decomposition methods is the alternating The authors are with the Department of Automatic Control and the ACCESS Linnaeus Center, KTH Royal Institute of Technology, Stockholm, Sweden. Email: {mjakobss, carlofi, chatw}@kth.se. This work was supported by the EU projects NoE HYCON2 and STREP Hydrobionets. A preliminary version of this work was presented in [1]

direction method of multipliers (ADMM). The method has recently attracted a substantial interest, especially for problems with large data sets [9]–[13]. Although the decomposition methods mentioned above distribute the computational workload over the network nodes, the subproblems must still be coordinated among the nodes. For example, dual decomposition based methods must update dual variables in a central master problem. This requires the network to iteratively 1) transfer information form all nodes to some central “master” node; 2) centrally update the dual variables; 3) broadcast the updated dual variables to all nodes of the network and back to 1) until convergence. To avoid the centralized master node, peer-to-peer or multiagent methods coordinate the subproblem through local neighbor interactions based on consensus algorithms, e.g., [14]– [18]. In these algorithms, nodes update their decision variables as convex combinations of the decision variables of their neighbors without a centralized master. In [19], the consensus algorithm has been combined with gradient descent to solve an unconstrained optimization problem where the objective is a sum of local, convex functions. The work is extended in [20], [21], which investigate constraints and randomness. While the previous papers solve the primal problem, [22]– [24] use consensus-based algorithms for the dual problem. Higher order methods are considered also for peer-to-peer optimization, e.g., [25] solves an unconstrained primal problem, whereas [26] solves a linearly constrained dual problem, both by approximating Newton’s algorithm through consensus. Consensus based methods have many benefits, such as resilience to node failures and changing network topology. However, since every round of consensus requires message passings, also these methods may suffer from communication overheads. A recent study of the tradeoff between communication and local computation can be found in [27]. The communication overhead is a problem especially in large scale distributed networks or wireless sensor networks, where the energy expenditure for communication can be orders of magnitude larger than the energy for computation [28]. The methods discussed thus far assume convex problems. There are other classes of algorithms that do not rely necessarily on convexity, but on other structural properties. Three such classes are abstract optimization [29], which generalizes linear optimization, monotonic optimization [30]–[32] , where the monotonicity of the cost function is used to iteratively refine a solution within bounds of the feasible region, and Interference Function optimization [33]–[37], which is the fundamental framework to solve radio power control problems over wireless networks. Given the importance of Interference Function optimization as a precursor of Fast Lipschitz optimization, and considering that we will give some application examples later on in this paper, we give below some technical detail on such

2

an optimization framework. In Interference Function optimization, typical radio power control problems are solved in a simple distributed (as in decentralized) way. In such an optimization approach, it is assumed that the nodes of a wireless network transmit signals of a certain level of radio power, say node i transmits with power pi . The signal is received at the intended receiver corrupted by multiplicative wireless channel attenuations and additive interference by other transmitter nodes. The level of power that the transmit power of node i has to overcome at the receiver in order to get the signal decoded is usually denoted by Ii (p), the interference function of transmitter i [36], where p = [p1 , p2 , . . . , pn ] is the vector of all transmit radio powers of the n transmitters in the wireless network. The goal of the basic power control problem is to minimize the radio powers pi , while overcoming the interference at each receiver, i.e., min p p

s.t. pi ≥ Ii (p) ∀i. (1) The solution of this problem is a particularly successful instance of distributed optimization. Affine versions of Ii (p) are the simplest and best studied type of interference function, but in theory one can consider functions I(p) of any form. The first distributed algorithm to solve such a problem was proposed in [33], and improved in [34], [35]. The algorithm was later generalized to the Interference Function framework by Yates [36]. In this framework, a function I(p) is called standard if, for all p ≥ 0, it fulfils 0 0 • Monotonicity: If p ≥ p , then I(p) ≥ I(p ), • Scalability: For all α > 1, αI(p) > I(αp). When problem (1) above is feasible, and the functions Ii (p) are standard, the unique solution is given by the fixed point of the iteration pk+1 := Ii (pk ), (2) i k+1 k k or p := I(p ) in vector form. Here, pi is the power of transmitter i at time k and pk = [pk1 , pk2 , . . . , pkn ]. The computation of the optimal solution by these iterations is much simpler than using the classical parallelization and decomposition methods. This is because there is no longer a need to centrally collect, compute and redistribute the coupling variables of the problem since Ii (pk ) can be know locally at node i [36]. Even in a centralized setting, iteration (2) is simpler than traditional Lagrangian methods, since no dual variables need to be stored and manipulated. The iterations require only that every node successively updates its transmit power by using local knowledge of other nodes’ current decision variables (radio powers). Another advantage is that the algorithm converges even though such a knowledge is delayed, i.e., when the decision variables pkj of other nodes come with some delay at node i [38]. Extensions of the Interference Function framework have been proposed in [39], where optimization problems whose interference functions are not standard by Yates’ definition are investigated. Instead, they introduce Type-II standard functions, which for all p ≥ 0 fulfill 0 0 • Type-II monotonicity: If p ≤ p , then I(p) ≥ I(p ). • Type-II scalability: ∀α > 1, I(αp) > (1/α)I(p).

These functions are shown to have the same fixed point properties as Yates’ standard functions, i.e., problem (1) with Type-II standard constraints can be solved through repeated iterations of the constraints (2). Further extension of the Interference Function framework have been proposed in [40]. However, these extensions focus on the existence of fixed points, without notions of optimality. Fast-Lipschitz optimization is a natural generalization of the Interference Function approach on how to solve distributed optimization problems over networks by using fixed point iterations similar to (2), but when the constraints are neither standard nor Type-II standard [28]. It also considers more general objective functions g0 (p) for problem (1). These application scenarios motivate Fast-Lipschitz optimization, as we also illustrate by an example in Section IV-B. The framework considers a class of possibly non-convex and multi-objective problems with monotonic objective functions. These problems have unique Pareto optimal solutions, well defined by a contractive system of equations formed by the problem constraints. Therefore, Fast-Lipschitz problems are solved without having to introduce Lagrangian functions and dual variables or consensus based iterations. This makes the framework particularly well suited when highly decentralized solutions, with few coordination messages, are required. This is important in typical areas such as wireless sensor networks and in multi-agent systems. In this paper, we substantially extend the class of problems that are currently solvable by the Fast-Lipschitz framework. This is established by 1) generalizing the original qualifying conditions, 2) considering over constrained under constrained optimization problems, and 3) studying non strictly monotonic cost functions. The remainder of this paper is organized as follows. In Section II we clarify our notation and recall Fast-Lipschitz optimization for the sake of making a comparison with the new extensions presented in this paper. The main contribution of the paper is found in Section III, where a new set of qualifying conditions for Fast-Lipschitz optimization is presented. Section IV starts by an example that illustrates some of the new results of this paper. Next, the theory is applied to the uplink radio power control with outages – a power control problem that does not fall within the Interference Function framework. Finally, the paper is concluded in Section V. II. P RELIMINARIES This section clarifies notation and recalls Fast-Lipschitz optimization to provide the essential background definitions for the core contribution of this paper. A. Notation Vectors and matrices are denoted by bold lower and upper case letters, respectively. The components of a vector x are denoted xi or [x]i . Similarly, the elements of the matrix A are denoted Aij or [A]ij . The transpose of a vector or matrix is denoted ·T . I and 1 denote the identity matrix and the vector of ones. A vector or matrix where all elements are zero is denoted by 0.

3

The gradient ∇f (x) should be interpreted as [∇f (x)]ij = ∂fj (x)/∂xi , whereas ∇i f (x) denotes the ith row of ∇f (x). k Note that ∇f (x)k = (∇f (x)) , which has not to be confused with the kth derivative. The spectral radius of A is denoted ρ(A). Vector norms are denoted k·k and matrix norms are denoted |||·|||. Unless specified k·k and |||·||| denote arbitrary P norms. |||A|||∞ = maxi j |Aij | is the norm induced by the `∞ vector norm. All inequalities are intended element-wise, i.e., they have nothing to do with positive definiteness. Remark. The notation I for interference functions does not follow the notational principles above, in order to harmonize with existing literature [34]–[36], [40]. B. Fast-Lipschitz optimization We will now give a formal definition (slightly different from the original article [28]) of Fast-Lipschitz problems. For a thorough discussion of Fast-Lipschitz properties we refer the reader to the above mentioned paper. Definition 1. A problem is said to be on Fast-Lipschitz form if it can be written max f0 (x) s.t. xi ≤ fi (x) ∀i ∈ A (3) xi = fi (x) ∀i ∈ B, where n m • f0 : R → R is a differentiable scalar (m = 1) or vector valued (m ≥ 2) function, •

A and B are complementary subsets of {1, . . . , n},

fi : Rn → R are differentiable functions. We will refer to problem (3) as our main problem. From the individual constraint functions we form the vector valued T function f : Rn → Rn as f (x) = f1 (x) · · · fn (x) . •

Remark 2. The characteristic feature of Fast-Lipschitz form is a pairing such that each variable is associated to one constraint. The form x ≤ f (x) is general, since any constraint on canonical form, g(x) ≤ 0, can be written x ≤ x − γg(x) for some positive constant γ. Definition 3. For the rest of the paper, we will restrict our attention to a bounding box D = {x ∈ Rn | a ≤ x ≤ b} . We assume D contains all candidates for optimality and that f maps D into D, f : D → D. This box arise naturally in practice, since any real-world quantity or decision must be bounded. Definition 4. A problem is said to be Fast-Lipschitz when it can be written on Fast-Lipschitz form and admits a unique Pareto optimal solution x? , defined as the unique solution to the system of equations x? = f (x? ).

(4)

A problem written on Fast-Lipschitz form is not necessarily Fast-Lipschitz. The following qualifying conditions guarantee that a problem on Fast-Lipschitz form also is Fast-Lipschitz.

Qualifying Conditions. For all x in D, f0 (x) and f (x) should be everywhere differentiable and fulfill at least one of the following cases (e.g., (0) and (i), or (0) and (ii)): (0) ∇f0 (x) > 0 AND (i.a) (i.b)

∇f (x) ≥ 0 |||∇f (x)||| < 1

OR (ii.a) (ii.b) (ii.c)

∇i f0 (x) = ∇j f0 (x) ∀i, j ∇f (x)2 ≥ 0 (e.g. ∇f (x) ≤ 0) |||∇f (x)|||∞ < 1

OR (iii.a) (iii.b)

f0 (x) > 0 ∈ R |||∇f (x)|||∞
0 ∈ Rm , and consider the scalarized problem min −µT f0 (x) (11) s.t. gi (x) = xi − fi (x) ≤ 0 ∀i. Since an arbitrary scaling of µ does not change thePsolution of the problem, we restrict ourselves to µ fulfilling k µk = 1. By varying µ, one can obtain all Pareto optimal solutions. Introduce dual variables λ ∈ Rn , and form the Lagrangian function L(x, λ) = −µT f0 (x) + λT (x − f (x)). Since every ˆ x ∈ D in problem (11) is regular (Lemma 14), any pair (ˆ x, λ) of locally optimal variables must satisfy the KKT-conditions ˆ must be a minimizer of (see e.g., [38, 3.1.1]). In particular, x ˆ which requires L(x, λ), ˆ = −∇f0 (ˆ ˆ − ∇f (ˆ ˆ = 0, ∇x L(ˆ x, λ) x)µ + λ x)λ (12) and strict complementarity must hold, i.e., ˆ i (ˆ λ xi − fi (ˆ x)) = 0 ∀i. (13) We will soon show that the qualifying conditions imply ˆ > 0. Before we continue, we state the following fact, λ extracted to next lemma for clarity and brevity. Lemma 15. Consider n matrices Ak and a vector c. If c is positive (c > 0) and each Ak is non-negative (Ak ≥ 0) with non-zero rows (i.e., no row of Ak equals the zero vector), then An An−1 · · · A0 c > 0. Proof: For any non-negative matrix B ≥ 0, the product Bc is non-negative. The ith row is given by [Bc]i = P B c ≥ 0. If the rows of B are non-zero at least one ij j j of the terms is positive, so Bc > 0. The lemma follows by induction. The following lemma is the main part of the proof of Theorem 7, and establishes that the optimal dual variable is strictly positive. Lemma 16. Whenever at least one of qualifying condiˆ with x ˆ ∈ D satisfying tions (Qc1)-(Qc6) hold, any pair (ˆ x, λ) ˆ equation (12) must have λ > 0. Proof: We only need to check (Qc1) and (Qc2), which generalize the other qualifying conditions. For notational conˆ , and venience, let us fix one x ∈ D, which we denote x introduce A , ∇f (ˆ x) ∈ Rn×n and c , ∇f0 (ˆ x)µ ∈ Rn . (14) Note that condition (Qc0) and Lemma 15 give c > 0 for every µ > 0. With the new notation, equation (12) can be written as ˆ − Aλ ˆ = 0, or λ ˆ = (I − A)−1 c. −c + λ (15) The inverse is well defined whenever I − A = I − ∇f (ˆ x) is invertible, which is true for all x ∈ D (proof of Lemma 14). The expansion of this inverse gives ˆ = (I + A + A2 + . . . )c. λ (16) 2 Conditions

(Qc1) and (Qc2). The other conditions imply (Qc2).

7

If qualifying condition (Qc1) holds, then (Qc1.a) gives A ≥ 0. Therefore, the parenthesized sum in equation (16) is nonnegative, with each diagonal element greater than or equal ˆ > 0. to 1. By Lemma 15, λ If instead qualifying condition (Qc2) holds, factor equation (15) as ˆ = (I + Ak + A2k + . . . )(I + A + · · · + Ak−1 )c λ ∞ X = (Ak )l (I + A + · · · + Ak−1 )c.

(17)

l=0

Condition (Qc2.b) ensures that each (Ak )l is non-negative, wherefore the infinite sum in expression (17) is a non-negative matrix with non-zero rows (guaranteed by the first term A0 = ˆ > 0 is I). By Lemma 15, a sufficient condition for λ (I + A + · · · + Ak−1 )c > 0 {z } |

⇔

−Bc < c.

(18)

,B

Let cmin and cmax be the minimum and maximum elements of c and consider row i of inequality (18). We can now bound the right side by cmin ≤ ci and the left side by P P n n [−Bc]i = j=1 Bij cj ≤ j=1 |Bij ||cj | Pn (19) ≤ max j=1 |Bij |cmax = |||B|||∞ cmax .

which together with inequalities (20)-(22) ensures c Pk−1 min x) ≤ dmin ≤ x) < q(ˆ , |||B|||∞ = l=1 ∇f (ˆ cmax ∞ ˆ > 0 by inequalities (18)-(20) and Lemma 15. wherefore λ ˆ satisfying the KKT We now know that every pair (ˆ x, λ) ˆ conditions must have λ > 0, provided the qualifying conditions hold. Furthermore, strict complementarity (equation ˆ i > 0, we have λ ˆ i (ˆ (13)) must hold. Since λ xi − fi (ˆ x)) = 0 ˆ = f (ˆ only if x î − fi (ˆ x) = 0, i.e., x x). In other words, any candidate for primal optimality must be a fixed point of f (x). It remains to show that there always exists a fixed point, and that this fixed point is unique. But this is already done, since the qualifying conditions and Lemma 11 ensure f (x) is contractive, and the result follow from Proposition 10. By now, we have shown that the unique point x? = f (x? ) is the only possible optimum of problem (11). Since this is true for all scalarization vectors µ, x? is the unique Pareto optimal point of problem (8). Finally, Lemma 12 extends this to the originally considered problem (3). By this, we have taken all the steps to prove Theorem 7. The remaining parts of this section will loosen some of the assumptions of the problem structure, i.e., investigate problems not entirely on Fast-Lipschitz form.

i

Therefore, equation (18) holds if |||B|||∞ cmax < cmin , or cmin |||B|||∞ < . (20) cmax Let ∇i f0 (ˆ x) be the ith row of ∇f0 (ˆ x) and define a(µ) = argmin ∇i f0 (ˆ x)µ, ∇i f0 (ˆ x)

and b(µ) = argmax ∇i f0 (ˆ x)µ. ∇i f0 (ˆ x)

Let dk (µ) = ak (µ)/bk (µ) and express the components of a as ak (µ) = dk (µ)bk (µ). Since c = ∇f0 (ˆ x)µ, we have X X cmin = a(µ)∇f0 (ˆ x) = ak (µ)µk = dk (µ)bk (µ)µk k

and cmax = b(µ)∇f0 (ˆ x) =

k

X

bk (µ)µk .

k

The fraction in (20) can therefore be bounded by P P ak (µ) · µk dk (µ) · bk (µ) · µk cmin = Pk = kP cmax b (µ) · µ k k bk (µ) · µk Pk k dmin (µ) · bk (µ) · µk ≥ k P = dmin (µ), (21) k bk (µ) · µk where ak (µ) dmin (µ) = min dk (µ) = min k k bk (µ) mini [∇f0 (ˆ x)]ik ≥ min = q(ˆ x) (22) k maxi [∇f0 (ˆ x)]ik with q(x) defined in (6). By the definition of B and condition (Qc2.c) we get k−1 X ∇f (ˆ x) < q(ˆ x), |||B|||∞ = l=1

∞

D. Additional constraints In this section we complement the Fast-Lipschitz form (3) with an additional set X . Hence, we consider the following problem: max f0 (x) s.t. xi ≤ fi (x) ∀i ∈ A (23) xi = fi (x) ∀i ∈ B x ∈ X. Proposition 17. If at least one of qualifying conditions (Qc1)(Qc6) hold, and X contains a point x? = f (x? ), then problem (23) is Fast-Lipschitz. Proof: Relax the problem by removing the new constraint x ∈ X . The relaxed problem is our main problem (3), whereby the qualifying conditions and Theorem 7 ensure x? = f (x? ) is the unique optimum. Since x? ∈ X , this is also the unique optimum of problem (23). In theory, we can handle any set X provided we can show x? ∈ X . For example, X does not need to be bounded, convex, or even connected. In practice however, the most common form of X is a box constraint, for example a requirement of non-negativity, X = {x : 0 ≤ x}. In these cases, X becomes the natural choice for the imagined bounding box D. E. Fewer constraints than variables – Constant constraints The Fast-Lipschitz form in problem (3) requires one (and only one) constraint fi for each variable xi . In this section we will look at the case when the number of constraints (fi ) are fewer than the number of variables. We will assume that the individual variables are upper and lower bounded, which is always the case for problems of engineering interest, such as wireless networks. This means we get an extra constraint set X as discussed in section III-D. We treat the case when

8

all constraints are inequalities. It follows from Lemma 12 that the following results are true also for problems with equality constraints. T Consider a partitioned variable x = yT zT ∈ X ⊂ Rn and the problem max f0 (x) s.t. y ≤ fy (x) (24) x ∈ X, with ( ( ) y ∈ Xy = {y : ay ≤ y ≤ by } X = x : . z ∈ Xz = {z : az ≤ z ≤ bz }

Consider again equation (15) and denote E , (I − A)−1 , i.e., λ = Ec. As in the proof of Lemma 16, E is well defined if ρ(A) < 1. This is the case, since the eigenvalues of a block triangular matrix are the union of the eigenvalues of the diagonal blocks, wherefore ρ(A) = ρ(A11 ) < 1 by assumption (a). As in the proof of Lemma 16, we must show λ > 0. This time we will make use of the block structure of A and c.3 From the block matrix inverse formula we get E11 E12 c1 −1 λ = (I − A) c = , E21 E22 c2

In the formulation above, there are no constraints fi for the variables zi . However, by enforcing z ∈ Xz twice we get the equivalent problem max f0 (x) s.t. y ≤ fy (x) (25) z ≤ fz (x) = bz x ∈ X. This problem has the right form (23) and is Fast-Lipschitz if ∇y fx (x) ∇y fz (x) ∇y f0 (x) ∇f = and ∇f0 = ∇z fy (x) ∇z fz (x) ∇z f0 (x) fulfill one of the qualifying conditions (Qc1)-(Qc6). Since fz (x) = bz is constant, ∇f (x) simplifies to ∇y fy (x) 0 ∇f = . ∇z fy (x) 0 The special structure of ∇f0 (x) can be exploited to construct less restrictive qualifying conditions. Fist, consider problem (24) with a fixed z ∈ Xz . The problem can be written max f0|z (y) s.t. y ≤ fy|z (y) (26) y ∈ Xy . We will refer to problem (26) as the subproblem (f0|z , fy|z ).

where E11 = (I − A11 ) , E12 = 0, E21 = A21 E11 and E22 = I, i.e., λ1 E11 c1 E11 c1 = = . λ2 E21 c1 + c2 A21 E11 c1 + c2

Proposition 18. Consider problem (25). Suppose that (a) the subproblem (f0|z , fy|z ) fulfills at least one of qualifying conditions (Qc1)-(Qc6) for all z ∈ Xz , and it holds for all x ∈ X , that either (c) ∇z f0 (x) ≥ 0, and ∇z fy (x) ≥ 0 with non-zero rows, or if (d) ∇z f0 (x) ≥ 0 with non-zero rows, and ∇z fy (x) ≥ 0, or if |||∇z fy (x)|||∞ δz (x) < , (e) 1 − |||∇y fy (x)|||∞ ∆y (x) where δz (x) = minij [∇z f0 (x)]ij and ∆y (x) = maxij [∇y f0 (x)]ij . Then, problem (24) is Fast-Lipschitz. Proof: We refer to the arguments of Section III-C where we modify Lemma 16 as follows. The particular form and partitioning remain in the definitions (14) , giving ∇y fy (ˆ x) 0 A11 0 A= = (27) ∇z fy (ˆ x) 0 A21 0 and ∇y f0 (ˆ x) c c= µ= 1 . (28) ∇z f0 (ˆ x) c2

−1

−1

Note that λ1 = E11 c1 = (I − A11 ) c1 , so λ1 > 0 since the subproblem (f0|z , fy|z ) fulfills Lemma 16 by assumption (a). The second block component is λ2 = A21 E11 c1 + c2 = A21 λ1 + c2 .

(29)

Given that λ1 > 0, we need to show λ2 > 0 if either of assumptions (b), (c), or (d) hold. We start with assumption (b), which ensures c2 ≥ 0 and A21 ≥ 0 with non-zero rows (so A21 λ1 > 0 by Lemma 15), wherefore λ2 > 0 by equation (29). Assumption (c) assures A21 ≥ 0 and c2 > 0, wherefore λ2 > 0 by equation (29). Finally, assuming (d) is fulfilled, we see that λ2 > 0 if c2 > −A21 E11 c1 . Analogous to equations (19) and (20), this holds if mini [c2 ]i > |||A21 E11 |||∞ maxi [c1 ]i , or since c1 > 0 P∞ −1 and E11 = (I − A11 ) = k=0 Ak11 , when P∞ mini [c2 ]i > A21 k=0 Ak11 ∞ . (30) maxi [c1 ]i By the triangle inequality, the sub-multiplicative property of matrix norms and the geometric series, the right side of equation (30) can be upper bounded by P∞ |||A21 |||∞ ≥ A21 k=0 Ak11 ∞ (31) 1 − |||A11 |||∞ P Finally the definitions of ci and j µj = 1 gives P mini j [∇z f0 (x)]ij µj mini [c2 ]i P = maxi [c1 ]i maxi j [∇y f0 (x)]ij µj P minij [∇z f0 (x)]ij j µj δz (x) P ≥ = . (32) maxij [∇y f0 (x)]ij j µj ∆y (x) By combining inequalities (30)-(32), a sufficient condition ensuring λ2 > 0 is |||A21 |||∞ δz (x) < , 1 − |||A11 |||∞ ∆y (x) which is guaranteed by assumption (d). This concludes the proof. 3 Formulas for the inverse of a block matrix, as well as the products of two block matrices can be found in [44].

9

F. Non-strictly monotonic objective function – Variables missing in objective function Sometimes it is practical or necessary to formulate problems where not all variables appear in the objective function. For example, the problem max f0 (x) s.t. x ≤ fx (x, z) z ≤ fz (x, z) has some variables not affecting the cost function and is not on Fast-Lipschitz form. Redefining f0 = f0 (x, z) gives a problem of the right form (3), but ∇z f0 (x, z) = 0 everywhere. Therefore, condition (Qc0) and Theorem 7 can not be used to classify the problem as Fast-Lipschitz. The situation above is a special case of the following problem. Consider a partitioned optimization variable (x, z) and the problem max f0 (x, z) s.t. x ≤ fx (x, z) (33) z ≤ fz (x, z). Suppose f0 is monotonic, i.e. ∇f0 ≥ 0, but partitioned such that ∇x f0 (x, z) ∇f0 (x, z) = , ∇z f0 (x, z) where ∇x f0 (x, z) has non-zero rows for all (x, z) ∈ D, while ∇z f0 (x, z) can have zero rows. By partitioning the objective function gradient, one can find situations when problem (33) is actually Fast-Lipschitz. Proposition 19. Consider problem (33). If it holds, for all x in D, that (a) ∇x f0 (x, z) > 0 and ∇z f0 (x, z) ≥ 0 ∇x fx (x, z) ∇x fz (x, z) (b) ∇f (x, z) = ≥ 0, ∇z fx (x, z) ∇z fz (x, z) (c) |||∇f (x)||| < 1 or ρ ∇f (x) < 1, and (d) ∇z fx (x, z) has non-zero rows. Then, problem (33) is Fast-Lipschitz. Remark 20. The condition that the ith row of ∇z fx (x, z) is non-zero means that an increase in the variable zi will allow an increase of some variable xj , which in turn will influence the objective. Proof: The proof of Theorem 7 can be reused, with some alterations to Lemma 16. The partitioning of ∇f and ∇f0 remains in A and c, i.e., ∇x fx (x, z) ∇x fz (x, z) A11 A12 A= = ∇z fx (x, z) ∇z fz (x, z) A21 A22 and ∇x f0 (x, z) c c= µ= 1 . ∇z f0 (x, z) c2 Assumption (a) and µ > 0 gives c1 > 0 and c2 ≥ 0. Just as in the case (Qc1) of Lemma 16, assumptions (b) and (c) −1 guarantee the existence and non-negativity of E = (I − A) , −1 so λ = (I − A) c ≥ 0 is well defined and non-negative.4 Thus, it remains to show λ = Ec > 0. 4 In

contrast to Lemma 16, we only have the weak inequality c ≥ 0.

−1

Expressing the inverse E = (I − A) block-wise, we have −1 I − A11 A12 E11 E12 −1 (I − A) = = , A21 I − A22 E21 E22 −1 −1 where E11 = I − A11 + A12 (I − A22 ) A21 and −1

E21 = (I − A22 ) A21 E11 . We now have λ1 E11 E12 c1 E11 c1 E12 = = + c . λ2 E21 E22 c2 E21 c1 E22 2 Since E ≥ 0, all blocks Eij are non-negative. The second term is always non-negative and can be ignored, it is enough to show that the first term is strictly positive. Since c1 > 0, and E11 ≥ 0, we have that λ1 > 0 if E11 has non-zero rows (Lemma 15). This is always the case since E11 (defined as an inverse) is invertible. The second component can be expressed in terms of the first component: −1 λ2 = (I − A22 ) A21 λ1 .When λ1 > 0, A21 λ1 > 0 if A21 = ∇z fx (x, z) has non-zero rows (Lemma 15). This is −1 true by assumption (d). Since (I − A22 ) is invertible it has −1 non-zero rows and λ2 = (I − A22 ) (A21 λ1 ) > 0, which concludes the proof. We end this section by a general example. Example: Start by problem (3). For a lighter notation, we assume all constraints are inequalities. Transforming the problem to an equivalent problem on epigraph form gives max t s.t. t≤ f0 (x) x≤ f (x). This problem has a (non-strictly) monotonic objective, regardless of f0 (x). By writing this as max g0 (t, x) = t s.t. t ≤ gt (t, x) = f0 (x) x≤ gx (t, x) = f (x) we obtain ∇t g0 (t, x) I ∇g0 (t, x) = = , and ∇x g0 (t, x) 0 ∇t gt (t, x) ∇t gx (t, x) 0 0 ∇g(t, x) = = . ∇x gt (t, x) ∇x gx (t, x) ∇f0 (x) ∇f (x) Proposition 19 can now be applied, and the problem is FastLipschitz if ∇g ≥ 0, ρ(∇g) < 1 and ∇f0 has non-zero rows. Since the eigenvalues of a block triangular matrix are the union of the eigenvalues of the diagonal blocks, we have ρ(∇g) = ρ(∇f ), wherefore the problem is Fast-Lipschitz if • ∇f (x) ≥ 0, |||∇f (x)||| < 1, and • ∇f0 (x) ≥ 0 with non-zero rows. This is precisely qualifying condition (Qc1) applied to the original problem (3), i.e., no generalization was achieved by considering the epigraph form of the problem. This concludes the main part of the paper. In the following section we will illustrate the new theory with two examples. IV. E XAMPLES We begin by illustrating the novel results established in this paper through a non convex optimization example. Next, in Section IV-B, we apply the Fast-Lipschitz theory to opportunistic uplink radio power control.

10

A. Simple non-convex example

x∈D

x∈D

if a, b < 1. Thus, when 0 ≤ a, b < 1, the problem is FastLipschitz by (Qc1). a, b ≤ 0: If a and b are instead non-positive, we have ∇f (x) ≤ 0 for all x in D and condition (Qc5.a) is fulfilled. In order to verify (Qc5.b), we need δ(x) and ∆(x). These are defined pointwise in x, as the smallest and largest (in absolute value) element of ∇f0 , i.e., δ(x) = 1 and ∆(x) = 2 ∀x. Condition (Qc5.b) now requires, for all x in D, that 1 δ(x) = . |||∇f (x)|||∞ < ∆(x) 2 From the previous case we know that |||∇f (x)|||∞ ≤ max{|a|, |b|} for all x ∈ D, so the problem is guaranteed Fast-Lipschitz by qualifying condition (Qc5), provided that −1/2 < a, b ≤ 0. Note that this would not have met the old case (ii) in (5d), since the rows of ∇f0 (x) are not identical. ab < 0: When a and b have different signs, neither (Qc1.a), nor (Qc5.a) holds. Instead, one can try qualifying condition (Qc6), which does not place any sign restrictions on ∇f (x). It is only required, in (Qc6.a), that δ(x) . |||∇f (x)|||∞ < δ(x) + ∆(x) These quantities are unchanged from the previous cases, so δ(x) = 1, ∆(x) = 2 and |||∇f (x)|||∞ ≤ max{|a|, |b|}, so the problem is Fast-Lipschitz by qualifying condition (Qc6) if both |a| and |b| are less than 1/3. Also in this case, the old qualifying conditions would not have worked since case (iii) in (5g) requires a scalar cost function.

0.6 f1 (x) f2 (x) xk+1= f (xk )

0.4 x2

Consider the problem max f0 (x) s.t. x ≤ f (x) (34) x ∈ X = {x : 0 ≤ x ≤ 1}, where x ∈ R2 , 2x1 + x2 1 + ax22 f0 (x) = , and f (x) = 0.5 . x1 + 2x2 1 + bx21 If either a or b are positive, the problem is not convex (the canonical constraint functions xi − fi (x) become concave). At this point we do not know if x? = f (x? ) ∈ X . However, following the results of Section III-D, we can assume this is the case and examine whether max f0 (x) (35) s.t. x ≤ f (x) fulfills any of the qualifying conditions. The qualifying conditions must apply in the box D, which we select equal to X (no point outside of D = X is feasible, hence all feasible points lie in D). The gradients of the cost and constraint functions are 2 1 0 bx1 ∇f0 = and ∇f = . 1 2 ax2 0 Since ∇f0 > 0 for all x, condition (Qc0) is always fulfilled. We will now check all sign combinations of a and b. a, b ≥ 0: If both a and b are non-negative, ∇f (x) ≥ 0 for all x in D and condition (Qc1.a) holds. To verify condition (Qc1.b), one must find a norm |||·||| such that |||∇f (x)||| < 1 for all x in D. When the ∞-norm is used, we get max |||∇f (x)|||∞ = max max{|bx1 |, |ax2 |} ≤ max{b, a} < 1

0.2

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

x1 (a) Illustration of the feasible region of problem (35) with a = −0.3 and b = 0.3 (making the feasible region non-convex). The iterates (36) of the solution quickly converges to x? = f (x? ), where all constraints are active. k x − x? ∞

10−1 10−6 10−11 10−16

0

5

10 k

15

20

(b) Convergence of the iterates (36), measured in the ∞-norm . The convergence is geometric (linear in the log domain) and the optimal solution is found within an accuracy of 10−6 after 8 iterations.

Fig. 1: Plots from the simple example in Section IV-A Solution of the problem: If problem (35) is FastLipschitz by any of the cases above, the optimal point x? is found by solving x? = f (x? ). We now solve the problem when a = −0.3 and b = 0.3, by iterating xk+1 := f (xk ).

(36)

This sequence will converge to x? = f (x? ) since the qualifying conditions imply that f is contractive (Lemma 11). The iterates xk of (36), together with the feasible region of the problem is shown in Fig. 1a. Clearly, x? = f (x? ) ∈ X , so Proposition 17 applies and x? is optimal also for the original problem (34). The convergence the iterations (36) is shown in Fig. 1b. B. Uplink power control with outages The following example, inspired by [45], presents an interference function Ii (p) that is neither standard, type-II standard, or standard in the sense of [40]. However, it can be analyzed within the framework given in this paper. The main result of this section consists in showing under which conditions the iterations by the interference function considered in [45] leads to optimal solutions, which was unknown before.

11

We start with the power control problem (1), but with a more general objective g0 (p): min g0 (p) (37) s.t. pi ≥ Ii (p) ∀i. Problem (1) is recovered when g0 (p) = p. In what follows we consider any g0 (p) that is increasing in p, i.e., ∇g0 (p) ≥ 0. In [45] two complications are considered on the interference function, by which the sequence (2) is not ensured to converge by the standard assumptions on Interference Function optimization. The first complication is that in wireless communication systems there is an upper limit bi for the power that transmitter i can generate. When such a limit is reached, no transmission is done because a transmission with pi < Ii (p) is likely to fail, and not transmitting reduces the interference for the neighboring receivers. This strategy, which is called cut-off or outage, is also used in opportunistic algorithms [39], where priority is given to the links with better wireless channels. The cut-off is modeled by the function ( x if x ≤ bi , hi (x) = 0 if x > bi , and the transmit power sequence (2) now becomes pk+1 := i hi Ii (pk ) . The second complication is that [45] considers a radio power control model with stochastic channel fading. There, it is easy to show that the interference requirement is pi ≥ Ii (p)/Fi , where Fi is the stochastic channel fading and Ii (p) is the average interference of link i. Similarly to iteration (2), each transmitter i now updates its power as Ii (pk ) k+1 k . (38) pi := Φi (Ii (p )) , EFi hi Fi The iterations in (38) can be seen as a possible solution algorithm for a standard power control problem, with interference functions Iî (p) = Φi (Ii (p)), i.e., min g0 (p) (39) s.t. pi ≥ Φi (Ii (p)) ∀i. However, the new constraints Φi (p) are not monotonic, wherefore the theory of standard interference functions no longer guarantees optimality or even convergence. In [45] it is shown that, if for each i • Ii (p) is standard, and • Φi (x) = EFi [hi (x/Fi )] is univariate, bounded and absolutely subhomogeneous, meaning e−|a| Φi (x) ≤ Φi (ea x) ≤ e|a| Φi (x) for all x > 0 and a > 0, then the sequence (38) will converge to a fixed point. However nothing is said about the optimality of this fixed point. The fundamental question is if these iterations lead to an optimal solution. We will now present conditions on the statistical distribution of Fi under which problem (39) is Fast-Lipschitz, whereby both convergence and optimality are guaranteed. By introducing x = −p, we can rewrite it on Fast-Lipschitz form as max f0 (x) = −g0 (−x) (40) s.t. xi ≤ fi (x) = −Φi (Ii (−x)) ∀i.

We will not consider a specific objective function f0 (x) (or g0 (p)), but will assume that it is strictly monotonic, i.e., ∇f0 (x) > 0 for all x. If this problem is Fast-Lipschitz, we know that the limit of sequence (38) is the unique optimal point. The non-monotonic properties of Φi (p) are inherited by fi (x). Therefore, neither qualifying condition (Qc1), nor (Qc3) applies. Instead we can use (Qc4) and thus need an upper bound on |||∇f (x)|||∞ , where ∇f (x) is the gradient of f = [f1 . . . fn ]T with respect to x. Result 21. Let ϕi be the pdf of the channel fading coefficient Fi , and define Z ∞ ϕi (y) dy − ϕi (z). (41) Ωi (z) = y z Then, the constraint function norms of problems (40) and (37) fulfill |||∇f (x)|||∞ ≤ max |Ωi (Ii (p)/bi ) | · |||∇I(p)|||∞ . i

Proof: To avoid tedious notation, we first consider the derivatives ∂Φi /∂pi . Also, since the functions hi , Fi and Φi are all specific to mobile i, we temporarily drop the subscripts i and the variable dependency indication. For example, Φ of equation (38) becomes Φ(p) = E[h(I/F )]. By denoting the PDFZ of F as ϕ(y), we have Z ∞ ∞ ϕ(y) dy, Φ(p) = E[h(I/F )] = h(I/y)ϕ(y) dy = I y 0 I/b since h(I/y) = 0 when y < I/b, ! and Z ∞ Z ∞ dΦ d ϕ(y) ϕ(y) = I dy = dy − ϕ(I/b) dI dI y y I/b I/b , Ω(I/b). (42) Returning to full notation, we can now form ∂fi (x) ∂ dΦi ∂Ii (p) dpi = − Φi Ii (p) = − ∂xi ∂xi dIi ∂pi dxi ∂Ii (p) = Ωi (Ii (p)/bi ) ∂pi It follows that X Ωi (Ii (p)/bi ) ∂Ii (p) |||∇f (x)|||∞ = max i ∂pi i6=i X ∂Ii (p) ≤ max |Ωi (Ii (p)/bi )| · max ∂pi i i i6=i

= max |Ωi (Ii (p)/bi )| · |||∇I(p)|||∞ . i

Result (21) is true regardless of the underlying interference model I(p), and can be used as follows: Corollary 22. Suppose optimization problem (37) fulfils qualifying condition (Qc4.a) up to a scaling factor α > 0, i.e., if q(p) . α |||I(p)|||∞ < 1 + q(p) Then, the smoothed problem (40) is Fast-Lipschitz if maxi,z |Ωi (z)| ≤ α. This corollary allows us to say that problem (40), regardless the underlying interference model (37), is Fast-Lipschitz if q(p) max |Ωi (z)| < |||I(p)|||∞ ∀p. i,z 1 + q(p)

12

≈−0.323

and Ωi (z) → 0 as z → ∞ respectively. It follows that maxi,z |Ωi (z)| ≤ α if p p π/2 1 1.26 1 ≈ α ≥ max{ π/2 , } ⇔ σi ≥ σi 3σi α α for all i. This means that if Fi is due to a Rayleigh distribution and • the original (deterministic and outage-free) problem (37) is Fast-Lipschitz by qualifying condition (Qc4), i.e., α ≤ 1 in Corollary 22, and • the fading follows a Rayleigh distribution with p σi ≥ π/2, then problem (40) is Fast-Lipschitz by Corollary 22. It follows that the equivalent problem (39) has a unique optimal solution given by p? such that p?i = Φi (Ii (p? )) , and the iterations pk+1 := Φi (Ii (pk )) converge to p? in decentralized and i asynchronous setting. V. C ONCLUSIONS AND FUTURE WORK In this Lipschitz results, a problems

paper we significantly extended the previous Fastframework proposed in [28]. Based on these new larger set of convex and non-convex optimization can be solved by the Fast-Lipschitz method of this

1 hm (x)

• σm = 1

0.8

σm = 1.26

Φm (Im )

N σm = 3 0.6 0.4

N

0.2

•

0

0

0.5

1

1.5

2

2.5 Im

3

3.5

4

4.5

5

(a) The smoothed mobile behavior function Φ(Ii ) for different σi , when Fi follows a Rayleigh distribution. The dashed lines show the best approximations that are absolutely subhomogeneous, as required in [45]. The dotted line shows the function hi (x) when bi = 1. • σm = 1 σm = 1.26 1 Ωm (z)

For fading coefficients from an arbitrary distribution, the function Ωi (z) in equation (41) might not be expressed on closed form. However, the max-value of Ωi (z) can be found through numerical calculations. A particular distribution, which allows analytical inspections, is examined next. 1) Fading: Assume Fi is follows a Rayleigh distribution with parameter σi and with pdf 2 2 y ϕi (y) = 2 e−y /2σi , σi > 0. σi Recalling the definition of Ω(z) in (41), we calculate the first term of Ω(z) as Z ∞ Z ∞ −y2 /2σ2 i ϕi (y) e dy. dy = 2 y σi z z √ √ By the substitution y = 2σi t we get dy = 2σi dt and Z ∞ Z ∞ 2 ϕi (y) e−t √ 2σi dt dy = √ 2 y z z/ 2σi σi r z π erfc √ = , 2σi2 2σi where erfc(·) is the complementary error function. Therefore, we have r 2 2 π z z √ erfc Ωi (z) = − 2 e−z /2σi . 2 2σi σi 2σi and 2 2 dΩi (z) e−z /2σi z2 = 2 − , dz σi2 σi2 √ which is smooth, and equal to zero only when z = 2σi . Therefore, the extreme values of Ωi (z) must occur as z → 0, √ z = 2σip , or z → ∞. Evaluating Ωi at these points gives Ωi (z) → π/2 σ1i as z → 0, r √ √ 1 1 π Ωi ( 2σi ) = erfc(1) − 2e−1 > − , σi 2 3σi {z } |

N σm = 3

0.5

N 0

• 0

0.5

1

1.5

2

2.5 z

3

3.5

4

4.5

5

(b) This figure show p the behaviour of Ωi (z) for different σi . When π/2 ≈ 1.26 is the lower limit of σi for which α = 1, σi = Corollary 22 applies (i.e., |Ωi (z)| < 1 ∀z).

Fig. 2: Plots from the uplink power control example. paper, both in a centralized and in a distributed set-up. This avoids using Lagrangian methods, which are inefficient in terms of computations and communication complexity especially when used over networks. Several possible extensions remain to consider, for example • We believe also non-smooth problems can be FastLipschitz. • Through a simple transformation of our main problem (3) one can obtain a problem that fulfills one of the qualifying conditions even if problem (3) does not. By further investigation of these transformations, the qualifying conditions can be relaxed. • So far, only problems where ∇f0 ≥ 0 have been considered. The standard inequality we have used is a partial ordering induced by the non-negative orthant Rn+ , i.e., ∇f0 ≥Rn+ 0. A possible extension is to allow for problems where ∇f0 ≥K 0 for a more general cone K. This would allow a tradeoff between conditions for ∇f0 and conditions on ∇f , which may give more flexibility in the qualifying conditions. R EFERENCES [1] M. Jakobsson and C. Fischione, “Extensions of Fast-Lipschitz optimization for convex and non-convex problems,” in 3rd IFAC Workshop on Distributed Estimation and Control in Networked Syst. (NecSys), 2012.

13

[2] J. Tsitsiklis, Problems in Decentralized Decision Making and Computation. PhD thesis, 1984. [3] J. Tsitsiklis, D. Bertsekas, and M. Athans, “Distributed asynchronous deterministic and stochastic gradient optimization algorithms,” IEEE Trans. Autom. Control, vol. 31, 1986. [4] S. Boyd, L. Xiao, A. Mutapcic, and J. Mattingley, “Notes on decomposition methods,” 2007. [5] F. P. Kelly, A. K. Maulloo, and D. K. H. Tan, “Rate control for communication networks: Shadow prices, proportional fairness and stability,” J. Operational Research Soc., vol. 49, no. 3, 1998. [6] S. Low and D. Lapsley, “Optimization flow control – I: Basic algorithm and convergence,” IEEE/ACM Trans. Netw., vol. 7, 1999. [7] D. Palomar and M. Chiang, “Alternative distributed algorithms for network utility maximization: Framework and applications,” IEEE Trans. Autom. Control, vol. 52, 2007. [8] E. Wei, A. Ozdaglar, and A. Jadbabaie, “A distributed newton method for network utility maximization, I: Algorithm,” 2011. LIDS report 2832. [9] D. Gabay and B. Mercier, “A dual algorithm for the solution of nonlinear variational problems via finite element approximation,” Comp. & Math. with Applicat., vol. 2, no. 1, pp. 17 – 40, 1976. [10] J. Eckstein and D. P. Bertsekas, “On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators,” Math. Programming, vol. 55, pp. 293–318, 1992. [11] I. Schizas, A. Ribeiro, and G. Giannakis, “Consensus in ad hoc wsns with noisy links – Part I: Distributed estimation of deterministic signals,” IEEE Trans. Signal Process., vol. 56, 2008. [12] I. Schizas, G. Giannakis, S. Roumeliotis, and A. Ribeiro, “Consensus in ad hoc wsns with noisy links – Part II: Distributed estimation and smoothing of random signals,” IEEE Trans. Signal Process., vol. 56, 2008. [13] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method R in Mach. Learning, vol. 3, no. 1, of multipliers,” Found. and Trends pp. 1–122, 2011. [14] A. Jadbabaie, J. Lin, and A. S. Morse, “Coordination of groups of mobile autonomous agents using nearest neighbor rules,” IEEE Trans. Autom. Control, vol. 48, pp. 988–1001, June 2003. [15] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE Trans. Inf. Theory, vol. 52, pp. 2508–2530, 2006. [16] A. Dimakis, A. Sarwate, and M. Wainwright, “Geographic gossip: Efficient averaging for sensor networks,” IEEE Trans. Signal Process., vol. 56, 2008. [17] A. Olshevsky and J. Tsitsiklis, “Convergence speed in distributed consensus and averaging,” SIAM J. Control and Optmization, vol. 48, no. 1, pp. 33–55, 2009. [18] A. Chiuso, F. Fagnani, L. Schenato, and S. Zampieri, “Gossip algorithms for distributed ranking,” in Amer. Control Conf., pp. 5468–5473, 2011. [19] A. Nedić and A. Ozdaglar, “Distributed subgradient methods for multiagent optimization,” IEEE Trans. Autom. Control, vol. 54, pp. 48–61, 2009. [20] A. Nedić, A. Ozdaglar, and P. Parrilo, “Constrained consensus and optimization in multi-agent networks,” IEEE Trans. Autom. Control, vol. 55, pp. 922–938, Apr. 2010. [21] S. Sundhar Ram, A. Nedić, and V. Veeravalli, “Distributed stochastic subgradient projection algorithms for convex optimization,” J. Optimization Theory and Applicat., vol. 147, pp. 516–545, 2010. [22] J. Duchi, A. Agarwal, and M. Wainwright, “Dual averaging for distributed optimization: Convergence analysis and network scaling,” IEEE Trans. Autom. Control, vol. 57, pp. 592 –606, Mar. 2012. [23] K. I. Tsianos, S. Lawlor, and M. G. Rabbat, “Push-sum distributed dual averaging for convex optimization,” in IEEE 51st Annual Conference on Decision and Control (CDC), pp. 5453 –5458, Dec. 2012. [24] M. Zhu and S. Martinez, “On distributed convex optimization under inequality and equality constraints,” IEEE Trans. Autom. Control, vol. 57, pp. 151–164, 2012. [25] F. Zanella, D. Varagnolo, A. Cenedese, G. Pillonetto, and L. Schenato, “Newton-Raphson consensus for distributed convex optimization,” in IEEE 50th Conf. Decision and Control, 2011. [26] M. Zargham, A. Ribeiro, and A. Jadbabaie, “Network optimization under uncertainty,” in IEEE 51st Conf. Decision and Control, pp. 7470 –7475, Dec. 2012. [27] K. I. Tsianos, S. Lawlor, and M. G. Rabbat, “Communication/computation tradeoffs in consensus-based distributed optimization,” in Advances in Neural Inform. Process. Syst. 25, pp. 1952–1960, 2012. [28] C. Fischione, “Fast-lipschitz optimization with wireless sensor networks applications,” IEEE Trans. Autom. Control, vol. 56, pp. 2319–2331, 2011.

[29] G. Notarstefano and F. Bullo, “Distributed abstract optimization via constraints consensus: Theory and applications,” IEEE Trans. Autom. Control, vol. 56, pp. 2247–2261, 2011. [30] H. Tuy, “Monotonic optimization: Problems and solution approaches,” SIAM Journal on Optimization, vol. 11, no. 2, pp. 464–494, 2000. [31] E. Jorswieck and E. Larsson, “Monotonic optimization framework for the two-user miso interference channel,” IEEE Trans. Commun., vol. 58, 2010. [32] E. Björnson, G. Zheng, M. Bengtsson, and B. Ottersten, “Robust monotonic optimization framework for multicell miso systems,” IEEE Trans. Signal Process., vol. 60, no. 5, pp. 2508–2523, 2012. [33] J. Zander, “Distributed cochannel interference control in cellular radio systems,” IEEE Trans. Veh. Technol., vol. 41, pp. 305 –311, Aug. 1992. [34] S. Grandhi, R. Vijayan, and D. Goodman, “Distributed power control in cellular radio systems,” IEEE Trans. Commun., vol. 42, pp. 226 –228, 1994. [35] G. Foschini and Z. Miljanic, “A simple distributed autonomous power control algorithm and its convergence,” IEEE Trans. Veh. Technol., vol. 42, pp. 641–646, 1993. [36] R. Yates, “A framework for uplink power control in cellular radio systems,” IEEE J. Sel. Areas Commun., vol. 13, pp. 1341 –1347, 1995. [37] J. Herdtner and E. Chong, “Analysis of a class of distributed asynchronous power control algorithms for cellular wireless systems,” IEEE J. Sel. Areas Commun., vol. 18, no. 3, pp. 436–446, 2000. [38] D. Bertsekas and J. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Athena Scientific, 1997. [39] C. Sung and K. Leung, “A generalized framework for distributed power control in wireless networks,” IEEE Trans. Inf. Theory, vol. 51, pp. 2625–2635, 2005. [40] H. Boche and M. Schubert, “The structure of general interference functions and applications,” IEEE Trans. Inf. Theory, vol. 54, pp. 4980 –4990, 2008. [41] C. Fischione, “F-Lipschitz optimization,” 2010. Techical report IR-EE-RT 2010:023. Available online: http://www.ee.kth.se/php/modules/publications/reports/2010/IR-EERT 2010 023.pdf. [42] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press, 1985. [43] D. P. Bertsekas, Nonlinear Programming. Athena Scientific, 2nd ed., Sept. 1999. [44] K. B. Petersen and M. S. Pedersen, The Matrix Cookbook. 2008. Version, November 14, 2008. [45] C. Nuzman, “Contraction approach to power control, with nonmonotonic applications,” in IEEE Global Telecommun. Conf., (GLOBECOM), pp. 5283 –5287, Nov. 2007.