Distributed Constrained Optimization by Consensus-Based Primal ...

6 downloads 0 Views 274KB Size Report
Nov 6, 2013 - novel distributed consensus-based primal-dual perturbation (PDP) algorithm, which ..... 8 on some perturbation points, denoted by ˆα(k) and.
1

arXiv:1304.5590v2 [cs.SY] 6 Nov 2013

Distributed Constrained Optimization by Consensus-Based Primal-Dual Perturbation Method Tsung-Hui Chang⋆ , Member, IEEE, Angelia Nedi´c†, Member, IEEE, and Anna Scaglione‡, Fellow, IEEE

Revised, Nov. 2013 Abstract Various distributed optimization methods have been developed for solving problems which have simple local constraint sets and whose objective function is the sum of local cost functions of distributed agents in a network. Motivated by emerging applications in smart grid and distributed sparse regression, this paper studies distributed optimization methods for solving general problems which have a coupled global cost function and have inequality constraints. We consider a network scenario where each agent has no global knowledge and can access only its local mapping and constraint functions. To solve this problem in a distributed manner, we propose a consensus-based distributed primal-dual perturbation (PDP) algorithm. In the algorithm, agents employ the average consensus technique to estimate the global cost and constraint functions via exchanging messages with neighbors, and meanwhile use a local primal-dual perturbed subgradient method to approach a global optimum. The proposed PDP method not only can handle smooth inequality constraints but also non-smooth constraints such as some sparsity promoting constraints arising in sparse optimization. We prove that the proposed PDP algorithm converges to an optimal primal-dual solution of the original problem, under standard problem and network assumptions. Numerical results illustrating the performance of the proposed algorithm for a distributed demand response control problem in smart grid are also presented. Index terms− Distributed optimization, constrained optimization, average consensus, primal-dual subgradient method, regression, smart grid, demand side management control

The work of Tsung-Hui Chang is supported by National Science Council, Taiwan (R.O.C.), by Grant NSC 102-2221-E-011005-MY3. The work of Angelia Nedi´c is supported by the NSF grants CMMI 07-42538 and CCF 11-11342. The work of Anna Scaglione is supported by NSF CCF-1011811. ⋆ Tsung-Hui Chang is the corresponding author. Address: Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei 10607, Taiwan, (R.O.C.). E-mail: [email protected]. † Angelia Nedi´c is with Department of Industrial and Enterprise Systems Engineering, University of Illinois at UrbanaChampaign, Urbana, IL 61801, USA. E-mail: [email protected]. ‡ Anna Scaglione is with Department of Electrical and Computer Engineering, University of California, Davis, CA 95616, USA. E-mail: [email protected]. November 7, 2013

DRAFT

2

I. I NTRODUCTION Distributed optimization methods are becoming popular options for solving several engineering problems, including parameter estimation, detection and localization problems in sensor networks [1], [2], resource allocation problems in peer-to-peer/multi-cellular communication networks [3], [4], and distributed learning and regression problems in control [5] and machine learning [6]–[8], to name a few. In these applications, rather than pooling together all the relevant parameters that define the optimization problem, distributed agents, which have access to a local subset of such parameters, collaborate with each other to minimize a global cost function, subject to local variable constraints. Specifically, since it is not always efficient for the agents to exchange across the network the local cost and constraint functions, owing to the large size of network, time-varying network topology, energy constraints and/or privacy issues, distributed optimization methods that utilize only local information and messages exchanged between connecting neighbors have been of great interest; see [9]–[16] and references therein. Contributions: Different from the existing works [9]–[14] where the local variable constraints are usually simple (in the sense that they can be handled via simple projection) and independent among agents, in this paper, we consider a problem formulation that has a general set of convex inequality constraints that couple all the agents’ optimization variables. In addition, similar to [17], the considered problem has a global (non-separable) convex cost function that is a function of the sum of local mapping functions of the local optimization variabless. Such a problem formulation appears, for example, in the classical regression problems which have a wide range of applications. In addition, the considered formulation also arises in the demand response control and power flow control problems in the emerging smart grid systems [18]–[20]. More discussions about applications are presented in Section II-B. In this paper, we assume that each agent knows only the local mapping function and local constraint function. To solve this problem in a distributed fashion, in this paper, we develop a novel distributed consensus-based primal-dual perturbation (PDP) algorithm, which combines the ideas of the primal-dual perturbed (sub-)gradient method [21], [22] and the average consensus techniques [10], [23], [24]. In each iteration of the proposed algorithm, agents exchange their local estimates of the global cost and constraint functions with their neighbors, followed by performing one-step of primal-dual variable (sub-)gradient update. Instead of using the primalDRAFT

November 7, 2013

3

dual iterates computed at the preceding iteration as in most of the existing primal-dual subgradient based methods [15], [16], the (sub-)gradients in the proposed distributed PDP algorithm are computed based on some perturbation points which can be efficiently computed using the messages exchanged from neighbors. In particular, we provide two efficient ways to compute the perturbation points that can respectively handle the smooth and non-smooth constraint functions. More importantly, we build convergence analysis results showing that the proposed distributed PDP algorithm ensures a strong convergence of the local primal-dual iterates to a global optimal primal-dual solution of the considered problem. The proposed algorithm is applied to a distributed sparse regression problem and a distributed demand response control problem in smart grid. Numerical results for the two applications are presented to demonstrate the effectiveness of the proposed algorithm. Related works: Distributed dual subgradient method (e.g., dual decomposition) [25] is a popular approach to solving a problem with coupled inequality constraints in a distributed manner. However, given the dual variables, this method requires the agents to globally solve the local subproblems, which may require considerable computational efforts if the local cost and constraint functions have some complex structure. Consensus-based distributed primal-dual (PD) subgradient methods have been developed recently in [15], [16] for solving a problem with an objective function which is the sum of local convex cost functions, and with global convex inequality constraints. In addition to having a different cost function from our problem formulation, the works in [15], [16] assumed that all the agents in the network have global knowledge of the inequality constraint function; the two are in sharp contrast to our formulation where a non-separable objective function is considered and each agent can access only its local constraint function. Moreover, these works adopted the conventional PD subgradient updates [26], [27] without perturbation. Numerical results will show that these methods do not perform as well as the proposed algorithm with perturbation. Another recent development is the Bregmandistance based PD subgradient method proposed in [28] for solving an epigraph formulation of a min-max problem. The method in [28], however, assumes that the Lagrangian function has a unique saddle point, in order to guarantee the convergence of the primal-dual iterates. In contrast, our proposed algorithm, which uses the perturbed subgradients, does not require such assumption. Synopsis: Section II presents the problem formulation, applications, and a brief review of the November 7, 2013

DRAFT

4

centralized PD subgradient methods. Section III presents the proposed distributed consensusbased PDP algorithm. The assumptions and convergence analysis results are given in Section IV. Numerical results are presented in Section V. Finally, the conclusions and discussion of future extensions are drawn in Section VI. II. P ROBLEM F ORMULATION , A PPLICATIONS

AND

B RIEF R EVIEW

A. Problem Formulation We consider a network with N agents, denoted by V = {1, . . . , N}. We assume that, for

all i = 1, . . . , N, each agent i has a local decision variable1 xi ∈ RK , a local constraint set

Xi ⊆ RK , and a local mapping function fi : RK → RM , in which fi = (fi1 , . . . , fiM )T with

each fim : RK → R being continuous. The network cost function is given by F¯ (x1 , . . . , xN ) , F

N X

!

fi (xi ) ,

i=1

(1)

where F : RM → R and F¯ : RN K → R are continuous. In addition, the agents are subject to a P K P global inequality constraint N i=1 gi (xi )  0, where gi : R → R are continuous mappings for

all i = 1, . . . , N; specifically, gi = (gi1 , . . . , giP )T , with each gip : RK → R being continuous. P The vector inequality N i=1 gi (xi )  0 is understood coordinate-wise. We assume that each agent i can access F (·), fi (·), gi (·) and Xi only, for all i = 1, . . . , N.

Under this local knowledge constraint, the agents seek to cooperate with each other to mini¯ 1 , . . . , xN ) (or maximize the network utility −F¯ (x1 , . . . , xN )). mize the total network cost F(x Mathematically, the optimization problem can be formulated as follows min

xi ∈Xi , i=1,...,N

F¯ (x1 , . . . , xN ) s.t.

N X i=1

gi (xi )  0.

(2)

The goal of this paper is to develop a distributed algorithm for solving (2) with each agent communicating with their neighbors only.

1

Here, without loss of generality, we assume that all the agents have the same variable dimension K. The proposed algorithm

and analysis can be easily generalized to the case with different variable dimensions. DRAFT

November 7, 2013

5

B. Application to Smart Grid Control In this subsection, we discuss some smart grid control problems where the problem formulation (2) may arise. Consider a power grid system where a retailer (e.g., the utility company) bids electricity from the power market and serves a residential/industrial neighborhood with N customers. In addition to paying for its market bid, the retailer has to pay additional cost if there is a deviation between the bid purchased in earlier market settlements and the real-time aggregate load of the customers. Any demand excess or shortfall results in a cost for the retailer that mirrors the effort to maintain the power balance. In the smart grid, thanks to the advances in communication and sensory technologies, it is envisioned that the retailer can observe the load of customers and can even control the power usage of some of the appliances (e.g., controlling the charging rate of electrical vehicles and turning ON/OFF air conditioning systems), which is known as the demand side management (DSM); see [29] for a recent review. We let pt , t = 1, . . . , T , be the power bids over a time horizon of length T , and let ψi,t (xi ), t = 1, . . . , T , be the load profile of customer i, where xi ∈ RK contains some control variables. The structures of ψi,t and xi depend on the appliance load model. As mentioned, the retailer aims to minimize the cost caused by power imbalance, e.g., [18], [19], [29] min

x1 ∈X1 ,...,xN ∈XN

Cp

 X N i=1

ψi (xi ) − p

+ 

 +  N X + Cs p − ψi (xi )

(3)

i=1

where (x)+ = max{x, 0}, Xi denotes the local control constraint set and Cp , Cs : RT → R denote the cost functions due to insufficient and excessive power bids, respectively. Moreover, P + and let p = (p1 , . . . , pT )T and ψi = (ψi,1 , . . . , ψi,T )T . By defining z = ( N i=1 ψi (xi ) − p)

assuming that Cp is monotonically increasing, one can write (3) as min

x1 ∈X1 ,...,xN ∈XN

s.t.



Cp [z] + Cs z − N X i=1

N X

ψi (xi ) + p

i=1



(4)

ψi (xi ) − p − z  0,

which belongs to the considered formulation in (2). Similar problem formulations also arise in the microgrid control problems [20], [30] where the microgrid controller requires not only to control the loads but also to control the local power generation and local power storage (i.e., power flow control), in order to maintain power balance within the microgrid; see [30] for November 7, 2013

DRAFT

6

detailed formulations. Distributed control methods are appealing to the smart grid application since all the agents are identical and failure of one agent would not have significant impact on the performance of the whole system [31]. Besides, it also spares the retailer/microgrid controller from the task of collecting real-time information of customers, which not only infringes on the customer’s privacy but also is not easy for a large scale neighborhood. In Section V, the proposed distributed algorithm will be applied to a DSM problem as in (4). In addition to the smart grid applications, problem (2) incorporates the important regression problems which widely appear in control [5], machine learning [6], [7], data mining [32], [33] and imaging processing [7] applications. Formulation (2) also encompasses the network flow control problems [34]; see [35] for an example which considered maximizing the network lifetime. The proposed distributed algorithm therefore can be applied to these problem as well. For example, in [36], we have shown how the proposed distributed algorithm can be applied to handle a distributed sparse regression problem. C. Centralized PD Subgradient Method Let us consider the following Lagrange dual problem of (2): ( ) max λ0

min L(x, λ) ,

(5)

x∈X

where x = (xT1 , . . . , xTN )T , X = X1 × · · · × XN , λ ∈ RP+ (i.e., the non-negative orthant in P RP ) is the dual variable associated with the inequality constraint N i=1 gi (xi )  0, and L : RN K × RP+ → R is the Lagrangian function, given by

¯ 1 , . . . , x N ) + λT L(x, λ) = F(x

N X i=1

!

gi (xi ) .

(6)

Throughout the paper, we assume that problem (2) is convex, i.e., X is closed and convex, ¯ F(x) is convex in x and each gi (xi ) is convex in xi . We also assume that the Slater condition ¯ 1, . . . , x ¯ N ) that lies in the relative interior of X1 × · · · × XN such holds, i.e., there is an (x PN ¯ i ) ≺ 0. Hence, the strong duality holds for problem (2) [37], problem (2) can that i=1 gi (x

be handled by solving its dual (5). A classical approach is the dual subgradient method [38]. One limitation of such method is that the inner problem minx∈X L(x, λ) needs to be globally solved at each iteration, which, however, is not always easy, especially when fi (xi ) and gi (xi ) are complex or when the problem is large scale. Another approach is the primal-dual (PD) DRAFT

November 7, 2013

7

subgradient method [26], [39] which handles the inner problem inexactly. More precisely, at iteration k, the PD subgradient method performs x(k) = PX (x(k−1) − ak Lx (x(k−1) , λ(k−1) )),

(7a)

λ(k) = (λ(k−1) + ak Lλ (x(k−1) , λ(k−1) ))+ ,

(7b)

where PX : RN K → X is a projection function, ak > 0 is a step size, and     PN (k) (k) (k) (k) (k) T (k) T L (x , λ ) ∇f (x )∇F ( i=1 fi (xi )) + ∇g1 (x1 )λ  x1   1 1      . .. (k) (k) . Lx (x , λ ) , =  , (8a) . .     P (k) (k) (k) T (k) LxN (x(k) , λ(k) ) ∇fNT (xN )∇F ( N f (x )) + ∇g (x )λ i N N i=1 i (k)

(k)

Lλ (x , λ ) ,

N X

(k)

gi (xi ),

(8b)

i=1

represent the subgradients of L at (x(k) , λ(k) ) with respect to x and λ, respectively. Each (k)

T ∇gi (xi ) is a P ×K Jacobian matrix with rows equal to the subgradients ∇gip (xi ), p = 1, . . . , P (k)

(gradients if they are continuously differentiable), and each ∇fi (xi ) is a M ×K Jacobian matrix

T with rows containing the gradients ∇fim (xi ), m = 1, . . . , M.

The idea behind the PD subgradient method lies in the well-known saddle-point relation: Theorem 1 (Saddle-Point Theorem) [37] The point (x⋆ , λ⋆ ) ∈ X ×RP+ is a primal-dual solution pair of problems (2) and (5) if and only if there holds: L(x⋆ , λ) ≤ L(x⋆ , λ⋆ ) ≤ L(x, λ⋆ ) ∀x ∈ X , λ  0.

(9)

According to Theorem 1, if the PD subgradient method converges to a saddle point of the Lagrangian function (6), then it solves the original problem (2). Convergence properties of the PD method in (7) have been studied extensively; see, for example, [26], [27], [39]. In such methods, typically a subsequence of the sequence (x(k) , λ(k) ) converges to a saddle point of the Lagrangian function in (6). To ensure the convergence of the whole sequence (x(k) , λ(k) ), it is often assumed that the Lagrangian function is strictly convex in x and strictly concave in λ, which does not hold in general however. One of the approaches to circumventing this condition is the primal-dual perturbed (PDP) subgradient method in [21], [22]. Specifically, [21] suggests to update x(k−1) and λ(k−1) based November 7, 2013

DRAFT

8

ˆ (k) and βˆ(k) , respectively. The PDP updates are on some perturbation points, denoted by α x(k) = PX (x(k−1) − ak Lx (x(k−1) , βˆ(k) )),

(10a)

ˆ (k) , λ(k−1) ))+ . λ(k) = (λ(k−1) + ak Lλ (α

(10b)

ˆ (k) , and Note that, in (10a), we have replaced λ(k−1) by βˆ(k) , and, in (10b), replaced x(k−1) by α ˆ (k) , λ(k−1) ) are perturbed subgradients. It was shown in [21] that, thus Lx (x(k−1) , βˆ(k) ) and Lλ (α

ˆ (k) , βˆ(k) ) and the step size ak , the primal-dual iterates in (10) converge with carefully chosen (α to a saddle point of (5), without any strict convexity and concavity assumptions on L. ˆ (k) and βˆ(k) . Our interests lie There are several ways to generate the perturbation points α specifically on those that are computationally as efficient as the PD subgradient updates in (10). Depending on the smoothness of {gip }Pp=1 , we consider the following two methods:

Gradient Perturbation Points: A simple approach to computing the perturbation points is using the conventional gradient updates exactly as in (7), i.e., ˆ (k) = PX (x(k−1) − ρ1 Lx (x(k−1) , λ(k−1) )), α

(11a)

βˆ(k) = (λ(k−1) + ρ2 Lλ (x(k−1) , λ(k−1) ))+

(11b)

where ρ1 > 0 and ρ2 > 0 are constants. The PDP subgradient method thus combines (10) and (11), which involve two primal and dual subgradient updates. Even though the updates are relatively simple, this method requires smooth constraint functions gip , p = 1, . . . , P . In cases when gip , p = 1, . . . , P, are non-smooth, we propose to use the following proximal perturbation point approach, which is novel and has not appeared in earlier works [21], [22]. Proximal Perturbation Points: When gip , p = 1, . . . , P, are non-smooth, we compute the ˆ (k) by the following proximal gradient update2 [40]: perturbation point α

2  X N

1 (k−1) (k−1) T (k−1) (k)

α − (x ¯

, ˆ = arg min − ρ ∇ F(x )) gi (αi )λ + α 1

α∈X 2ρ 1 i=1

(12)

where α = (αT1 , . . . , αTN )T and



P (k−1) ∇f T (x )∇F ( N i=1  1 1



(k−1) fi (xi )) 

 .. ¯ (k−1) ) =  ∇F(x  . .   P (k−1) (k−1) ∇fNT (xN )∇F ( N f (x )) i i=1 i 2

(13)

If not mentioned specifically, the norm function k · k stands for the Euclidian norm.

DRAFT

November 7, 2013

9

It is worthwhile to note that, when gip , p = 1, . . . , P , are some sparsity promoting functions (e.g., the 1-norm, 2-norm and the nuclear norm) that often arise in sparse optimization problems [7], [41], [42], the proximal perturbation point in (12) can be solved very efficiently and may even have closed-form solutions. For example, if gi (αi ) = kαi k1 for all i (P = 1), and X = RKN , (12) has a closed-form solution known as the soft thresholding operator [7]: ˆ (k) = (b − λ(k−1) ρ1 1)+ + (−b − λ(k−1) ρ1 1)+ , α

(14)

where b = x(k−1) − ρ1 ∇F¯ (x(k−1) ) and 1 is an all-one vector. III. P ROPOSED C ONSENSUS -BASED D ISTRIBUTED PDP A LGORITHM Our goal is to develop a distributed counterpart of the PDP subgradient method in (10). Consider the following saddle-point problem ( max λ∈D

min

xi ∈Xi , i=1,...,N

L(x1 , . . . , xN , λ)

)

(15)

where   ¯ x) ¯ − q˜ F( D = λ  0 | kλk ≤ Dλ , +δ γ

(16)

˜ is ¯ = (x ¯ T1 , . . . , x ¯ TN )T is a Slater point of (2), q˜ = minxi ∈Xi ,i=1,...,N L(x1, . . . , xN , λ) in which x ˜  0, γ = minp=1,...,P {− PN gip (x ¯ i )}, and δ > 0 the dual function value for some arbitrary λ i=1 ˆ ⋆ of (5) satisfies is arbitrary. It has been shown in [43] that the optimal dual solution λ ˆ ⋆k ≤ kλ

¯ x) ¯ − q˜ F( γ

(17)

ˆ ⋆ lies in D. Here we consider the saddle point problem (15), instead of the original and thus λ Lagrange dual problem (5), because D bounds the dual variable λ and thus also bounds the

subgradient Lx (x(k) , λ(k) ) in (8a). This property is important in building the convergence of

the distributed algorithm to be discussed shortly. Both (5) and (15) have the same optimal dual ˆ ⋆ and attain the same optimal objective value. One can further verify that any saddle solution λ point of (5) is also a saddle point of (15). However, to relate the saddle points of (15) to solutions of problem (2) some conditions are needed, as given in the following proposition.

November 7, 2013

DRAFT

10

Proposition 1 (Primal-dual optimality conditions) [44] Let the Slater condition hold and let ˆ ⋆ ) be a saddle point of (15). Then (x ˆ ⋆, . . . , x ˆ ⋆ ) is an optimal solution for problem ˆ ⋆, . . . , x ˆ⋆ , λ (x 1

1

N

N

(2) if and only if N X i=1

ˆ ⋆ )T ˆ ⋆i )  0 and (λ gi (x

N X

ˆ ⋆i ) gi (x

i=1

!

= 0. (k)

To have a distributed optimization algorithm for solving (15), in addition to xi , we let each (k)

agent i have a local copy of the dual iterate λ(k) , denoted by λi . Moreover, each agent i owns (k)

(k)

two auxiliary variables, denoted by yi

and zi , representing respectively the local estimates P (k) of the average values of the argument function N1 N i=1 fi (xi ) and of the inequality constraint P (k) function N1 N i=1 gi (xi ), for all i = 1, . . . , N. We consider a time-varying synchronous network model [11], where the network of agents at time k is represented by a weighted directed graph G(k) = (V, E(k), W (k)). Here (i, j) ∈ E(k) if and only if agent i can receive messages from

agent j, and W (k) ∈ RN ×N is a weight matrix with each entry [W (k)]ij representing a weight that agent i assigns to the incoming message on link (i, j) at time k. If (i, j) ∈ E(k), then

[W (k)]ij > 0 and [W (k)]ij = 0 otherwise. The agents exchange messages with their neighbors PN (k) (according to the network graph G(k)) in order to achieve consensus on λ(k) , i=1 gi (xi ) PN (k) and i=1 fi (xi ); while computing local perturbation points and primal-dual (sub-)gradient

updates locally. Specifically, the proposed distributed consensus-based PDP method consists of the following steps at each iteration k: (k−1)

1) Averaging consensus: For i = 1, . . . , N, each agent i sends yi (k−1)

all its neighbors j satisfying (j, i) ∈ E(k); it also receives yj

(k−1)

, zi

(k−1)

(k−1)

and λi

to

(k−1)

from its

(k−1)

(18)

, zj

and λj

N X

[W (k)]ij λj

neighbors, and combines the received estimates, as follows: (k)

y˜i

=

N X

(k−1)

[W (k)]ij yj

(k)

, z˜i

N X

=

j=1

(k−1)

[W (k)]ij zj

˜ (k) = , λ i

j=1

.

j=1

2) Perturbation point computation: For i = 1, . . . , N, if functions gip , p = 1, . . . , P, are smooth, then each agent i computes the local perturbation points by (k)

= PXi (xi

(k)

˜ (k) + ρ2 N z˜(k) ). = PD (λ i i

αi βi

(k−1)

(k−1)

− ρ1 [∇fiT (xi

(k)

(k−1)

)∇F (N y˜i ) + ∇giT (xi

(k)

˜ ]), )λ i

(19a) (19b) (k)

Note that, comparing to (11) and (12), agent i here uses the most up-to-date estimates N y˜i , (k) ˜ (k) in place of PN fi (x(k−1) ), PN gi (x(k−1) ) and λ(k−1) . If gip , p = 1, . . . , P, N z˜ and λ i

DRAFT

i

i=1

i

i=1

i

November 7, 2013

11

Algorithm 1 Distributed Consensus-Based PDP Algorithm 1:

(0)

Given initial variables xi

(0)

∈ Xi , λi

(0)

 0, yi

(0)

(0)

= fi (xi ) and zi

(0)

= gi (xi ) for each

agent i, i = 1, . . . , N; Set k = 1. 2:

repeat

3:

Averaging consensus: For i = 1, . . . , N, each agent i computes (18).

4:

Perturbation point computation: For i = 1, . . . , N, if {gip }Pp=1 are smooth, then each agent i computes the local perturbation points by (19); otherwise, each agent i instead (k)

computes αi 5:

by (20).

Local variable updates: For i = 1, . . . , N, each agent i updates (21), (22), (23) and (24) sequentially.

6: 7:

Set k = k + 1. until a predefined stopping criterion (e.g., a maximum iteration number) is satisfied.

(k)

are non-smooth, agent i instead computes αi by   1 (k) (k−1) (k−1) (k) (k) T T 2 ˜ kαi − (xi − ρ1 ∇fi (xi )∇F (N y˜i ))k , αi = arg min gi (αi )λi + αi ∈Xi 2ρ1

(20)

for i = 1, . . . , N. 3) Primal-dual perturbed subgradient update: For i = 1, . . . , N, each agent i updates its (k)

(k)

(k)

(k)

primal and dual variables (xi , λi ) based on the local perturbation point (αi , βi ): (k)

= PXi (xi

(k)

˜ (k) + ak gi (α(k) )). = PD (λ i i

xi λi

(k−1)

(k−1)

− ak [∇fiT (xi

(k)

(k−1)

)∇F (N y˜i ) + ∇giT (xi

(k)

)βi ]),

(21) (22)

(k)

(k)

4) Auxiliary variable update: For i = 1, . . . , N, each agent i updates variable yi , zi the changes of the local argument function

and the constraint function

(k)

= y˜i + fi (xi ) − fi (xi

(k)

= z˜i

yi zi

(k)

(k) fi (xi )

(k)

(k)

(k−1)

(k)

(k−1)

+ gi (xi ) − gi (xi

(k) gi (xi )

with

:

),

(23)

).

(24)

Algorithm 1 summarizes the above steps. We prove that Algorithm 1 converges under proper problem and network assumptions in the next section. Readers who are interested more in numerical performance of Algorithm 1 may go directly to Section V. November 7, 2013

DRAFT

12

IV. C ONVERGENCE A NALYSIS Next, in Section IV-A, we present additional assumptions on problem (2) and the network model. The main convergence results are presented in Section IV-B. The proofs are presented in Section IV-C and Section IV-D. A. Assumptions Our results will make use of the following assumption. Assumption 1 (a) The sets Xi , i = 1, . . . , N, are compact. In particular, for i = 1, . . . , N, there is a constant Dx > 0 such that kxi k ≤ Dx ∀xi ∈ Xi ;

(25)

(b) The functions fi1 , . . . , fiM , i = 1, . . . , N, are continuously differentiable. Note that Assumption 1(a) and Assumption 1(b) imply that fi1 , . . . , fiM have uniformly bounded gradients (denoted by ∇fim , m = 1, . . . , M) and are Lipschitz continuous, i.e., for some Lf > 0, max k∇fim (xi )k ≤ Lf , ∀xi ∈ Xi

(26)

1≤m≤M

max |fim (xi ) − fim (yi )| ≤ Lf kxi − yi k

1≤m≤M

∀xi , yi ∈ Xi .

(27)

Similarly, Assumption 1(a) and the convexity of functions gi1 , . . . , giP imply that all gip have uniformly bounded subgradients, which is equivalent to all gip being Lipschitz continuous. Thus, for some Lg > 0, we have max k∇gip (xi )k ≤ Lg

1≤p≤P

∀xi ∈ Xi ,

(28)

max |gip (xi ) − gip (yi )| ≤ Lg kxi − yi k ∀xi , yi ∈ Xi .

(29)

1≤p≤P

In addition, by Assumption 1 and the continuity of each gip (which is implied by the convexity of gip ) each fi and gi are also bounded on X , i.e., there exist constants Cf > 0 and Cg > 0 such that for all i = 1, . . . , N, kfi (xi )k ≤ Cf ,

kgi (xi )k ≤ Cg , ∀xi ∈ Xi , qP qP M P 2 2 where kfi (xi )k = m=1 fim (xi ) and kgi (xi )k = p=1 gip (xi ).

(30)

We also make use of the following assumption on the network utility costs F and F¯ :

DRAFT

November 7, 2013

13

Assumption 2 (a) The function F is continuously differentiable and has bounded and Lipschitz continuous gradients, i.e., for some GF > 0 and LF > 0, we have k∇F (x) − ∇F (y)k ≤ GF kx − yk ∀x, y ∈ RM , k∇F (y) k ≤ LF

(31)

∀y ∈ RM ;

(32)

(b) The function F¯ has Lipschitz continuous gradients, i.e., for some GF¯ > 0, ¯ ¯ k∇F(x) − ∇F(y)k ≤ GF¯ kx − yk ∀x, y ∈ X .

(33)

Note that the convexity of F¯ Assumption 1(a) indicate that F¯ is Lipschitz continuous, i.e., for some LF¯ > 0, ¯ kF¯ (x) − F(y)k ≤ LF¯ kx − yk ∀x, y ∈ X .

(34)

Assumptions 1 and 2 are used to ensure that the (sub-)gradients of the Lagrangian function L(x, λ) with respect to x are well behaved for applying (sub-)gradient-based methods. In cases that gip , p = 1, . . . , P, are smooth, we make use of the following additional assumption: Assumption 3 The functions gip , p = 1, . . . , P, are continuously differentiable and have Lipschitz continuous gradients, i.e., there exists a constant Gg > 0 such that max k∇gip (xi ) − ∇gip (yi )k ≤ Gg kxi − yi k ∀xi , yi ∈ Xi .

(35)

1≤p≤P

We also have the following assumption on the network model [11], [17]: Assumption 4 The weighted graphs G(k) = (V, E(k), W (k)) satisfy: (a) There exists a scalar 0 < η < 1 such that [W (k)]ii > η for all i, k and [W (k)]ij > η if [W (k)]ij > 0. (b) W (k) is doubly stochastic:

PN

j=1 [W (k)]ij

= 1 for all i, k and

PN

i=1 [W (k)]ij

= 1 ∀j, k.

(c) There is an integer Q such that (V, ∪ℓ=1,...,Q E(k + ℓ)) is strongly connected for all k. Assumption 4 ensures that all the agents can sufficiently and equally influence each other in a long run. November 7, 2013

DRAFT

14

B. Main Convergence Results P Let Ak = kℓ=1 aℓ , and let

(k−1) ˆi x

k 1 X (ℓ−1) aℓ xi , i = 1, . . . , N, = Ak ℓ=1 (0)

(36) (k−1)

be the running weighted-averages of the primal iterates xi , . . . , xi

generated by agent i

until time k − 1. Our main convergence result for Algorithm 1 is given in the following theorem: √ Theorem 2 Let Assumptions 1-4 hold, and let ρ1 ≤ 1/(GF¯ + Dλ P Gg ). Assume that the step P size sequence {ak } is non-increasing and such that ak > 0 for all k ≥ 1, ∞ k=1 ak = ∞ and P∞ 2 ˆ (k) } and {λ(k) i }, i = 1, . . . , N, be generated by Algorithm 1 k=1 ak < ∞. Let the sequences {x (k)

ˆ (k) } and {λi }, i = 1, . . . , N, converge to using the gradient perturbation points in (19). Then, {x

an optimal primal solution x⋆ ∈ X and an optimal dual solution λ⋆ of problem (2), respectively. Theorem 2 indicates that the proposed distributed primal-dual algorithm asymptotically yields an optimal primal and dual solution pair for the original problem (2). The same convergence result holds if the constraint functions gip , p = 1, . . . , P, are non-smooth and the perturbation (k)

points αi

are computed according to (20).

Theorem 3 Let Assumptions 1, 2, and 4 hold, and let ρ1 ≤ 1/GF¯ . Assume that the step size P∞ sequence {ak } is non-increasing and such that ak > 0 for all k ≥ 1, k=1 ak = ∞ and P∞ 2 ˆ (k) } and {λ(k) i }, i = 1, . . . , N, be generated by Algorithm 1 k=1 ak < ∞. Let the sequences {x (k)

ˆ (k) } and {λi }, i = 1, . . . , N, using the perturbation points in (20) and (19b). Then, {x converge to an optimal primal solution x⋆ ∈ X and an optimal dual solution λ⋆ of problem (2), respectively. The proofs of Theorems 2 and 3 are presented in the next two subsections, respectively. C. Proof of Theorem 2

In this subsection, we present the major steps for proving Theorem 2. Three key lemmas that will be used in the proof are presented first. The first is a deterministic version of the lemma in [45, Lemma 11, Chapter 2.2]:

DRAFT

November 7, 2013

15

Lemma 1 Let {bk }, {dk } and {ck } be non-negative sequences. Suppose that bk ≤ bk−1 − dk−1 + ck−1 then the sequence {bk } converges and

P∞

k=1

∀ k ≥ 1,

P∞

k=1 ck

< ∞ and

dk < ∞.

Moreover, by extending the results in [17, Theorem 4.2] and [11, Lemma 8(a)], we establish (k)

(k)

(k)

the following result on the consensus of {λi }, {yi }, and {zi } among agents. Lemma 2 Suppose that Assumptions 1 and 4 hold. If {ak } is a positive, non-increasing sequence P 2 satisfying ∞ k=1 ak < ∞, then ∞ X k=1

∞ X k=1

∞ X k=1

∞ X k=1

(k) ˆ (k) k < ∞, ak kλi − λ

(k) ˆ (k) k = 0, lim kλi − λ

˜ (k) − λ ˆ (k−1) k < ∞, ak kλ i

˜ (k) − λ ˆ (k−1) k = 0, lim kλ i

k→∞

(k)

(k)

ak ky˜i − yˆ(k−1) k < ∞, (k)

ak kz˜i

(37)

k→∞

lim ky˜i − yˆ(k−1) k = 0,

k→∞

(k)

− zˆ(k−1) k < ∞,

lim kz˜i

k→∞

− zˆ(k−1) k = 0,

(38) (39) (40)

for all i = 1, . . . , N, where yˆ

(k)

N N N 1 X 1 X (k) 1 X (k) (k) (k) (k) ˆ ˆ fi (xi ), z = gi (xi ), λ = λ . = N i=1 N i=1 N i=1 i

(41)

The proof is omitted here due to the space limitation; interested readers may refer to the electronic (k)

(k)

(k)

companion [46]. Lemma 2 implies that the local variables λi , yi and zi at distributed agents ˆ (k) yˆ(k) and zˆ(k) , respectively. will eventually achieve consensus on the values of λ (k)

The local perturbation points αi

(k)

and βi

in (19) and (20) will also achieve consensus

asymptotically. In particular, following (11), we define (k)

ˆi α

(k−1)

= PXi (xi

(k−1)

− ρ1 [∇fiT (xi

ˆ (k−1) + ρ2 N zˆ(k) ), βˆ(k) = PD (λ

(k−1)

)∇F (N yˆ(k) ) + ∇giT (xi

ˆ (k−1) ]), )λ

(42a) (42b)

for i = 1, . . . , N, as the ‘centralized’ counterparts of (19); similarly, following (12), we define (k)

ˆi α

ˆ (k−1) + = arg min giT (αi )λ

November 7, 2013

αi ∈Xi

1 (k−1) (k−1) kαi − (xi − ρ1 ∇fiT (xi )∇F (N yˆ(k−1) ))k2 , (43) 2ρ1 DRAFT

16

for i = 1, . . . , N, as the centralized counterparts of the proximal perturbation point in (20). We show in Appendix A the following lemma: (k) (k) (k) (k) ˆ1 , . . . , α ˆ N , βˆ(k) ) Lemma 3 Let Assumptions 1 and 2 hold. For {αi , βi }N i=1 in (19) and (α

in (42), it holds that √ √ (k) (k) ˜ (k) − λ ˆ (k−1) k + ρ1 GF Lf M Nky˜(k) − yˆ(k−1) k, ˆ i − αi k ≤ ρ1 Lg P kλ kα i i

(44)

(k) ˜ (k) − λ ˆ (k−1) k + ρ2 Nkz˜(k) − zˆ(k−1) k, kβˆ(k) − βi k ≤ kλ i i

(45) (k)

i = 1, . . . , N. Equation (44) also holds for the proximal perturbation point αi (k)

ˆi α

in (20) and

in (43).

˜ (k) , y˜(k) and z˜(k) at distributed agents achieve consensus, each α(k) Lemma 3 says that, when λ i i i i (k) (k) (k) ˆ , and all the β converge to the common point βˆ . converges to α i

i

Now we are ready to prove Theorem 2. The proof primarily consists of showing two facts: (k) (k) ˆ (k) ˆ ,...,x ˆ ,λ (a) the primal-dual iterate pairs (x ) will converge to a saddle point of (15),

1 N (k) (k) ˆ (k) ˆ1 , . . . , x ˆ N , λ ) asymptotically satisfies the primal-dual optimality conditions in and (b) (x (k) (k) ˆ (k) ˆ1 , . . . , x ˆN , λ Proposition 1. Thus, (x ) is asymptotically primal-dual optimal to problem (2).

To show the first fact, we use (21), (22) and Lemma 3 to characterize the basic relations of the primal and dual iterates. Lemma 4 Let Assumptions 1 and 2 hold. Then, for any x = (xT1 , . . . , xTN )T ∈ X and λ ∈ D, the following two inequalities are true:   (k) 2 (k−1) 2 (k−1) ˆ(k) (k) kx − xk ≤ kx − xk − 2ak L(x , β ) − L(x, βˆ ) +

a2k N(





2



M Lf LF + Dλ P Lg ) + 2ak NDx M Lf GF

N X i=1



+ 2ak Dx P Lg

N  X i=1

N X i=1

(k) kλi

2

− λk ≤

N X i=1

(k−1) kλi

(k)

ky˜i − yˆ(k−1) k

 (k) (k) (k−1) (k−1) ˜ ˆ kλ i − λ k + ρ2 Nkz˜i − zˆi k ,

(46)

  (k) ˆ (k−1) (k) ˆ ,λ ˆ , λ) + a2k NCg2 − λk + 2αk L(α ) − L(α 2

√ ˜ (k) − λ ˆ (k−1) k + 4ρ1 NDλ GF P MLg Lf ak ky˜(k) − yˆ(k−1) k. (47) + 2ak (2ρ1 Dλ P L2g + Cg )kλ i i i

DRAFT

November 7, 2013

17

The detailed proof is given in the electronic companion [46]. The second ingredient is a relation ˆ (k−1) ) and the perturbation points (α ˆ (k) , βˆ(k) ), as given between the primal-dual iterates (x(k−1) , λ below. ˆ (k) , βˆ(k) ) Lemma 5 Let Assumptions 1, 2 and 3 hold. For the gradient perturbation points (α in (42), it holds true that ˆ (k−1) ) ˆ (k) , λ L(x(k−1) , βˆ(k) )−L(α   √ 1 ˆ (k−1) ˆ(k) 2 1 ˆ (k) k2 + kλ ≥ − (GF¯ + Dλ P Gg ) kx(k−1) − α − β k . (48) ρ1 ρ2 √ ˆ (k−1) ) → 0 ˆ (k) , λ Moreover, let ρ1 ≤ 1/(GF¯ + Dλ P Gg ), and suppose that L(x(k−1) , βˆ(k) ) − L(α ˆ ⋆ ) ∈ X × D as k → ∞. Then (x ˆ ⋆) ˆ ⋆, λ ˆ ⋆, λ and (x(k−1) , λ(k−1) ) converges to some limit point (x is a saddle point of (15). The proof is presented in Appendix B. Using the preceding lemmas, we show the first key (k) (k) ˆ (k) ˆ1 , . . . , x ˆN , λ fact, namely, that (x ) converges to a saddle point of (15). √ Lemma 6 Let Assumptions 1-4 hold, and let ρ1 ≤ 1/(GF¯ + Dλ P Gg ). Assume that the step P P∞ 2 size ak > 0 is a non-increasing sequence satisfying ∞ k=1 ak = ∞ and k=1 ak < ∞. Then (k)

ˆ ⋆i k = 0, lim kxi − x

k→∞

ˆ (k) − λ ˆ ⋆ k = 0, lim kλ

i = 1, . . . , N,

ˆ (k) k = 0, lim kx(k−1) − α

k→∞

k→∞

ˆ (k−1) − βˆ(k) k = 0, lim kλ

(49)

(50)

k→∞

ˆ ⋆ ∈ D form a saddle point of problem (15). ˆ ⋆ = ((x ˆ ⋆1 )T , . . . , (x ˆ ⋆N )T )T ∈ X and λ where x Proof: By the compactness of the set X and the continuity of the functions F¯ and gi , problem (2) has a solution. Due to the Slater condition, the dual problem also has a solution. By construction of the set D in (16), all dual optimal solutions are contained in the set D. We

let x⋆ = ((x⋆1 )T , . . . , (x⋆N )T )T ∈ X and λ⋆ ∈ D be an arbitrary saddle point of (15), and we apply Lemma 4 with x = (xT1 , . . . , xTN )T = x⋆ and λ = λ⋆ . By summing (46) and (47), we obtain the following inequality (kx

(k)

⋆ 2

−x k +

N X i=1

(k) kλi



− λ k ) ≤ (kx

+ c˜k − 2ak L(x November 7, 2013

⋆ 2

(k−1)

(k−1)

⋆ 2

−x k +

N X i=1

(k−1)

kλi

− λ ⋆ k2 )

 (k) ⋆ (k) (k) ˆ (k−1) (k) ⋆ ˆ ˆ ,λ ˆ , λ ) , (51) , β ) − L(x , β ) − L(α ) + L(α DRAFT

18

where √ √ c˜k , a2k N[( M Lf LF + Dλ P Lg )2 + Cg2 ] N X √ √ ˜ (k) − λ ˆ (k−1) k) + 2N M Lf GF (Dx + 2[Dx P Lg + Cg + 2ρ1 P DλL2g ] (ak kλ i i=1



+ 2ρ1 Dλ P Lg )

N X i=1

(k) (ak ky˜i

− yˆ

(k−1)



k) + 2Nρ2 Dx P Lg

First of all, by Theorem 1, we have ˆ (k) , λ⋆ ) − L(x⋆, λ⋆ ) ≥ 0, L(α

N X i=1

(k)

(ak kz˜i

− zˆ(k−1) k).

(52)

L(x⋆ , λ⋆ ) − L(x⋆ , βˆ(k) ) ≥ 0,

ˆ (k) , λ⋆ ) − L(x⋆, β (k) ) ≥ 0. Hence we deduce from (52) that implying that L(α (kx(k) − x⋆ k2 +

N X i=1

(k)

kλi − λ⋆ k2 ) ≤ (kx(k−1) − x⋆ k2 +

P∞

2 k=1 ak < ∞ and P thus ∞ ˜k < ∞. k=1 c

Secondly, by

N X i=1

(k−1)

kλi

− λ ⋆ k2 )

ˆ (k−1) )). ˆ (k) , λ + c˜k − 2ak (L(x(k−1) , βˆ(k) ) − L(α

(53)

by Lemma 2, we see that all the four terms in c˜k are summable

over k, and Thirdly, by Lemma 5 and under the premise of ρ1 ≤ 1/(GF¯ + √ ˆ (k−1) ) ≥ 0. Therefore, by applying Lemma 1 to ˆ (k) , λ Dλ P Gg ), we have L(x(k−1) , βˆ(k) ) − L(α P (k) ⋆ 2 relation (53), we conclude that the sequence {kx(k) − x⋆ k2 + N i=1 kλi − λ k } converges for   P (k−1) ˆ(k) (k) ˆ (k−1) ˆ any saddle point (x⋆ , λ⋆ ), and it holds that ∞ a L(x , β ) − L( α , λ ) < ∞. k=1 k P∞ Because k=1 ak = ∞, the preceding relation implies that ˆ (k−1) ) = 0. ˆ (k) , λ lim inf L(x(k−1) , βˆ(k) ) − L(α k→∞

(54)

Equation (54) implies that there exists a subsequence ℓ1 , ℓ2 , . . . such that ˆ (ℓk −1) ) → 0 as k → ∞. ˆ (ℓk ) , λ L(x(ℓk −1) , βˆ(ℓk ) ) − L(α

(55)

According to Lemma 5, the above equation indicates that ˆ (ℓk ) k = 0, lim kx(ℓk −1) − α

k→∞

ˆ (ℓk −1) − βˆ(ℓk ) k = 0. lim kλ

k→∞

(56)

ˆ (ℓk −1) )} ⊂ X × D is a bounded sequence, there must exist a limit Moreover, because {(x(ℓk −1) , λ ˆ ⋆ ) ∈ X × D, such that ˆ ⋆, λ point, say (x ˆ ⋆, x(ℓk −1) → x DRAFT

ˆ (ℓk −1) → λ ˆ ⋆ , as k → ∞. λ

(57) November 7, 2013

19

√ Under the premise of ρ1 ≤ 1/(GF¯ + Dλ P Gg ), and by (55) and (57), we obtain from Lemma ˆ ⋆ ) ∈ X × D is a saddle point of (15). Moreover, because ˆ ⋆, λ 5 that (x kx

(ℓk )

⋆ 2

ˆ k + −x

N X i=1

(ℓ ) kλi k

ˆ ⋆ k2 ≤ kx(ℓk ) − x ˆ ⋆ k2 + −λ

N X

(ℓk )

(kλi

i=1

ˆ (ℓk ) k + kλ ˆ (ℓk ) − λ ˆ ⋆ k)2 , −λ

P (k) ˆ⋆ 2 ˆ ⋆ k2 + N we obtain from Lemma 2 and (57) that the sequence {kx(k) − x i=1 kλi − λ k } has a P (k) ⋆ 2 limit value equal to zero. Since the sequence {kx(k) − x⋆ k2 + N i=1 kλi − λ k } converges for P (k) ˆ ⋆ k2 } in fact converges ˆ ⋆ k2 + N kλ − λ any saddle point of (15), we conclude that {kx(k) − x i=1

i

to zero, and therefore (49) is proved. Finally, relation (50) can also be obtained by (49), (53) √ and (48), provided that ρ1 ≤ 1/(GF¯ + Dλ P Gg ).  According to [44, Lemma 3], if x(k) → x⋆ as k → ∞, then its weighted running average x(k)

defined in (36) also converges to x⋆ as k → ∞. What remains is to show the second fact that ˆ (k) ) asymptotically satisfies the optimality conditions given by Proposition 1. We prove ˆ (k) , λ (x in Appendix C that the following lemma holds. Lemma 7 Under the assumptions of Lemma 6, it holds

 N +

X

(k) ˆ (k) )T

= 0, ˆ lim g ( x ) lim (λ i i

k→∞

k→∞

i=1

N X

(k) ˆi ) gi (x

i=1

!

= 0.

(58)

By Lemma 6, Lemma 7 and Proposition 1, we conclude that Theorem 2 is true. Finally, we

remark that when the step size ak has the form of a/(b + k) where a > 0, b ≥ 0, one can simply consider the running average below [44] k−1

¯ x

(k)

1 X (ℓ) x = = k ℓ=0

  1 1 ¯ (k−1) + x(k−1) , 1− x k k

(59)

instead of the running weighted-average in (36) while Lemma 7 still holds true.

D. Proof of Theorem 3 Theorem 3 essentially can be obtained in the same line as the proof of Theorem 2, except for ˆ (k) Lemma 5. What we need to show here is that the centralized proximal perturbation point α in (43) and βˆ(k) in (42b) and the primal-dual iterates (x(k−1) , λ(k−1) ) satisfy a result similar to Lemma 5. The lemma below is proved in Appendix D:

November 7, 2013

DRAFT

20

ˆ (k) in (43) Lemma 8 Let Assumptions 1 and 2 hold. For the centralized perturbation points α and βˆ(k) in (42b), it holds true that ˆ (k−1) ) ˆ (k) , λ L(x(k−1) , βˆ(k) )−L(α   1 ˆ (k−1) ˆ(k) 2 GF¯ 1 ˆ (k) k2 + kλ kx(k−1) − α − −β k . ≥ 2ρ1 2 ρ2

(60)

ˆ (k−1) ) → 0 and (x(k−1) , λ(k−1) ) → ˆ (k), λ Moreover, let ρ1 ≤ 1/GF¯ , and let L(x(k−1) , βˆ(k) ) − L(α ˆ ⋆ ) as k → ∞, where (x ˆ ⋆ ) ∈ X × D. Then (x ˆ ⋆ ) is a saddle point of (15). ˆ ⋆, λ ˆ ⋆, λ ˆ ⋆, λ (x V. S IMULATION R ESULTS In this section, we examine the efficacy of the proposed distributed PDP method (Algorithm 1) by considering the DSM problem discussed in Section II-B. We consider the DSM problem presented in (3) and (4). The cost functions were set to Cp (·) = πp k · k2 and Cs (·) = πs k · k2 , respectively, where πp and πs are some price parameters. The load profile function ψi (xi ) is based on the load model in [18], which were proposed to model deferrable, non-interruptible loads such as electrical vehicle, washing machine and tumble dryer et. al. According to [18], ψi (xi ) can be modeled as a linear function, i.e., ψi (xi ) = Ψi xi , where Ψi ∈ RT ×T is a coefficient matrix composed of load profiles of appliances of customer i. The control variable xi ∈ RT determines the operation scheduling of appliances of customer i. Due to some physical conditions and quality of service constraints, each xi is subject to a local constraint set Xi =

{xi ∈ RT | Ai di  bi , li ≤ di ≤ ui } where Ai ∈ RT ×T and li , ui ∈ RT [18]. The problem

formulation corresponding to (3) is thus given by

 N

 + 2 + 2 N X

X



Ψi xi − p + πs p − min πp Ψi xi

. xi ∈Xi , i=1,...,N

i=1

i=1

Analogous to (4), problem (61) can be reformulated as

2

N X

2

z − Ψ x + p πp kzk + πs min i i

xi ∈Xi ,i=1,...,N, z0

s.t.

(61)

(62a)

i=1

N X i=1

Ψi xi − p − z  0,

(62b)

to which the proposed distributed PDP method can be applied. We consider a scenario with 400 customers (N = 400), and follow the same methods as in [47] to generate the power DRAFT

November 7, 2013

21

bidding p and coefficients Ψi , Ai , bi , li , ui , i = 1, . . . , N. The network graph G was randomly generated. The price parameters πp and πs were simply set to 1/N and 0.8/N, respectively. In addition to the distributed PD method in [15], we also compare the proposed distributed PDP method with the distributed dual subgradient (DDS) method3 [18], [25]. This method is based on the same idea as the dual decomposition technique [25], where, given the dual variables, each customer globally solves the corresponding inner minimization problem. The average consensus subgradient technique [10] is applied to the dual domain for distributed dual optimization. Figure 1(a) shows the convergence curves of the three methods under test. The curves shown in this figure are the corresponding objective values in (61) of the running average iterates of the three methods. The step size of the distributed PD method in [15] was set to ak = that of the DDS method was set to ak = and ρ2 were respectively set to ak =

0.05 . 10+k

0.1 10+k

15 10+k

and

For the proposed distributed PDP method, ak , ρ1

and ρ1 = ρ2 = 0.001. From this figure, we observe

that the proposed distributed PDP method and the DDS method exhibit comparable convergence behavior; both methods converge within 100 iterations and outperform the distributed PD method in [15]. One should note that the DDS method is computational more expensive than the proposed distributed PDP method since, in each iteration, the former requires to globally solve the inner minimization problem while the latter takes two primal gradient updates only. For the proposed PDP Algorithm 1, the complexity order per iteration per customer is given by O(4T ) [see (19), (21) and (22)]. For the DDS method, each customer has to solve the inner linear programming (LP) in (63) minxi ∈Xi (λ − η)T Ψi xi per iteration. According to [48], the worst-case complexity of interior point methods for solving an LP is given by O(T 0.5 (3T 2 + T 3 )) ≈ O(T 3.5 ).

In Figure 1(b), we display the load profiles of the power supply and the unscheduled load (without DSM), while, in Figure 1(c), we show the load profiles scheduled by the three optimization methods under consideration. The results were obtained by respectively combining each of the optimization method with the certainty equivalent control (CEC) approach in [18, Algorithm 1] to handle a stochastic counterpart of problem (61). The stopping criterion was set 3

One can utilize the linear structure to show that (61) is equivalent to the following saddle point problem (by Lagrange dual) max

λ0, η0



min

xi ∈Xi i=1,...,N



 N X 1 1 Ψi xi − p) kλk2 − kηk2 + (λ − η)T ( 4πp 4πs i=1

(63)

to which the method in [15] and the DDS method [25] can be applied. November 7, 2013

DRAFT

22

to the maximum iteration number of 500. We can observe from this figure that, for all the three methods, the power balancing can be much improved compared to that without DSM control. However, we still can observe from Figure 1(c) that the proposed PDP method and the DDS method exhibit better results than the distributed PD method in [15]. Specifically, the cost in (61) is 4.49 × 104 KW for the unscheduled load whereas that of the load scheduled by the proposed distributed PDP method is 2.44 × 104 KW (45.65% reduction). The cost for the load scheduled by the distributed DDS method is slightly lower which is 2.38 × 104 KW; whereas

that scheduled by the distributed PD method in [15] has a higher cost of 3.81 × 104 KW.

As discussed in Section II-B, problem (2) also incorporates the important regression problems. In [36], we have applied the proposed PDP method to solving a distributed sparse regression problem (with a non-smooth constraint function). The simulation results can be found in [36]. VI. C ONCLUSIONS We have presented a distributed consensus-based PDP algorithm for solving problem of the form (2), which has a globally coupled cost function and inequality constraints. The algorithm employs the average consensus technique and the primal-dual perturbed (sub-) gradient method. We have provided a convergence analysis showing that the proposed algorithm enables the agents across the network to achieve a global optimal primal-dual solution of the considered problem in a distributed manner. The effectiveness of the proposed algorithm has been demonstrated by applying it to a smart grid demand response control problem and a sparse linear regression problem [36]. In particular, the proposed algorithm is shown to have better convergence property than the distributed PD method in [15] which does not have perturbation. In addition, the proposed algorithm performs comparably with the distributed dual subgradient method [25] for the demand response control problem, even though the former is computationally cheaper. A PPENDIX A P ROOF

OF

L EMMA 3

We first show (45). By definitions in (42b) and (19b), and by the non-expansiveness of projection, we readily obtain

   

(k) (k) (k) (k) (k−1) (k−1) ˜ + ρ2 N z˜ ˆ

ˆ kβˆ − βi k ≤ P λ − P λ + ρ N z D D 2 i i

(k)

(k)

˜ −λ ˆ (k−1) k + ρ2 Nkz˜ ≤ kλ i i

DRAFT

− zˆ(k−1) k.

November 7, 2013

23 4

9

x 10

1600 Proposed distributed PDP Distributed Dual Subgradient in [25,18] Distributed PD in [15]

7

Power supply Unscheduled load

1400 1200

6

1000 KW

Primal objective value

8

5

800

4

600

3

400

2

200

1 0

100

200 300 Iteration number

400

500

0

0

(a) Convergence curve

4

8

12 16 20 Time (hour)

24

(b) Unscheduled load profile

1600 Power supply Proposed distributed PDP Distributed PD in [15] Distributed Dual Subgradient in [25,18]

1400 1200

KW

1000 800 600 400 200 0

0

4

8

12 16 20 Time (hour)

24

(c) Scheduled load profiles

Fig. 1: Numerical results for the smart grid DSM problem (61) with 400 customers. (k)

Equation (44) for the αi (k) kαi



(k) ˆ i )k α

ˆ (k) in (19a) and α in (42a) can be shown in a similar line: i

  

(k−1) (k−1) (k) (k−1) (k) T T ˜ = − ρ1 ∇fi (xi )∇F (N y˜i ) + ∇gi (xi )λ i

PXi xi   

(k−1) (k−1) (k−1) ˆ (k−1) T (k−1) T

− PXi xi − ρ1 ∇fi (xi )∇F (N yˆ ) + ∇gi (xi )λ

(k−1)

(k)

(k−1)

≤ ρ1 k∇fiT (xi )kk∇F (N y˜i ) − ∇F (N yˆ(k−1) )k + ρ1 k∇gi (xi √ √ ˜ (k) − λ ˆ (k−1) k + ρ1 GF Lf MNky˜(k) − yˆ(k−1) k, ≤ ρ1 Lg P kλ i

November 7, 2013

i

˜ (k) − λ ˆ (k−1) k )kkλ i (A.1) DRAFT

24

where, in the second inequality, we have used the boundedness of gradients (cf. (26), (28)) and the Lipschitz continuity of ∇F (Assumption 2). (k)

To show that (44) holds for αi

(k)

ˆi in (20) and α

in (43), we use the following lemma:

Lemma 9 [49, Lemma 4.1] If y ⋆ = arg miny∈Y J1 (y) + J2 (y), where J1 : Rn → R and

J2 : Rn → R are convex functions and Y is a closed convex set. Moreover, J2 is continuously

differentiable. Then y ⋆ = arg miny∈Y {J1 (y) + ∇J2T (y ⋆ )y}.

˜ (k) and By applying the above lemma to (20) using J1 (α1 ) = giT (αi )λ i 1 (k−1) (k−1) (k) kαi − (xi − ρ1 ∇fiT (xi )∇F (N y˜i ))k2 , J2 (αi ) = 2ρ1 we obtain (k) ˜ (k) + (∇f T (x(k−1) )∇F (N y˜(k) ) + 1 (α(k) − x(k−1) ))T αi . (A.2) αi = arg min giT (αi )λ i i i i i αi ∈Xi ρ1 i Similarly, applying Lemma 9 to (43), we obtain (k) (k−1) T (k) ˆ (k−1) + (∇f T (x(k−1) )∇F (N yˆ(k−1) ) + 1 (α ˆ − xi ˆ i = arg min giT (αi )λ )) αi . (A.3) α i i αi ∈Xi ρ1 i From (A.2) it follows that 1 (k) (k−1) T (k) (k) ˜ (k) (k−1) (k) T )) αi giT (αi )λ )∇F (N y˜i ) + (αi − xi i + (∇fi (xi ρ1 1 (k) (k) ˜ (k) (k−1) (k) (k−1) T (k) T ˆ i )λ ˆi , ≤ giT (α )∇F (N y˜i ) + (αi − xi )) α i + (∇fi (xi ρ1 which is equivalent to ˜ ˆ i ) − giT (αi ))λ 0 ≤ (giT (α i (k)

(k)

(k−1)

+ ∇fiT (xi

(k) (k)

(k)

(k)

ˆ i − αi ) + )∇F (N y˜i )(α

1 (k) (k−1) (k) (k) ˆ i − αi ). (A.4) (αi − xi )(α ρ1

Similarly, equation (A.3) implies that (k)

(k)

ˆ (k−1) ˆ i ))λ 0 ≤ (giT (αi ) − giT (α (k−1)

+ ∇fiT (xi

(k)

(k)

ˆi ) + )∇F (N yˆ(k−1) )(αi − α

1 (k) (k−1) (k) (k) ˆ − xi ˆ i ). (A.5) (α )(αi − α ρ1 i

By combining (A.4) and (A.5), we obtain 1 (k) (k) (k) (k) ˜ (k) − λ ˆ (k−1) ) ˆ i − αi k2 ≤ (giT (α ˆ i ) − giT (αi ))(λ kα i ρ1 (k−1)

(k)

(k)

(k)

ˆ i − αi ) + ∇fiT (xi )(∇F (N y˜i ) − ∇F (N yˆ(k−1) ))(α  √ √ (k) (k) ˜ (k) − λ ˆ (k−1) k + GF Lf M Nky˜(k) − yˆ(k−1) k kα ˆ i − αi k, P Lg kλ ≤ i i DRAFT

November 7, 2013

25

where we have used the boundedness of gradients (cf. (26), (28)), the Lipschitz continuity of ∇F (Assumption 2) as well as the Lipschitz continuity of gi (in (29)). The desired result in



(44) follows from the preceding relation. A PPENDIX B P ROOF

OF

L EMMA 5

ˆ(k) in (42) assuming ˆ (k) We first prove that relation (48) holds for the perturbation points α i and β that Assumption 3 is satisfied. Note that (42a) is equivalent to (k−1) ˆ (k−1) )k2 , i = 1, . . . , N, ˆ (k) α = arg min kαi − xi + ρ1 Lxi (x(k−1) , λ i αi ∈Xi

where Lxi (x

(k−1)

ˆ (k−1)



(k−1)

) = ∇fiT (xi

(k−1)

)∇F (N yˆ(k) ) + ∇giT (xi

ˆ (k−1) . By the optimality )λ

condition, we have that, for all xi ∈ Xi , (k)

(k)

(k−1)

ˆ i )T (α ˆ i − xi (xi − α (k−1)

By choosing xi = xi

ˆ (k−1) )) ≥ 0. + ρ1 Lxi (x(k−1) , λ

, one obtains

(k) (k) ˆ (k−1) ) ≥ 1 kx(k−1) − α ˆ i k2 , ˆ i )T Lxi (x(k−1) , λ −α i ρ1 which, by summing over i = 1, . . . , N, gives rise to ˆ (k−1) ) ≥ 1 kx(k−1) − α ˆ (k) )T Lx (x(k−1) , λ ˆ (k) k2 . (x(k−1) − α ρ1 Further write the above equation as follows (k−1)

(xi

ˆ (k−1) ) ˆ (k) )T Lx (α ˆ (k) , λ (x(k−1) − α

1 (k−1) ˆ (k−1) ) − Lx (α ˆ (k−1) )) ˆ (k) k2 − (x(k−1) − α ˆ (k) )T (Lx (x(k−1) , λ ˆ (k) , λ kx −α ρ1 1 ˆ (k−1) ) − Lx (α ˆ (k−1) )k. (A.6) ˆ (k) k2 − kx(k−1) − α ˆ (k) k × kLx (x(k−1) , λ ˆ (k) , λ ≥ kx(k−1) − α ρ1 ˆ (k−1) ∈ D, we can bound the By (8), Assumption 2, Assumption 3 and the boundedness of λ ≥

second term in (A.6) as ˆ (k−1) ) − Lx (α ˆ (k−1) )k ˆ (k) , λ kLx (x(k−1) , λ



(k−1) T )

 ∇g1 (x1

ˆ (k−1) k  ¯ (k−1) ) − ∇F¯ (α ˆ (k) )k + kλ ≤ k∇F(x

  √ ˆ (k) k, ≤ (GF¯ + Dλ P Gg )kx(k−1) − α

November 7, 2013

(k−1)

T ∇gN (xN

− .. .



(k) ˆ1 ) ∇g1T (α  (k)

T ˆN ) ) − ∇gN (α

   F

(A.7) DRAFT

26

where k · kF denotes the Frobenious norm. By combining (A.6) and (A.7), we obtain   √ 1 (k−1) (k) T (k) ˆ (k−1) ˆ (k) k2 . ˆ ) L x (α ˆ ,λ (x −α )≥ − (GF¯ + Dλ P Gg ) kx(k−1) − α ρ1

(A.8)

ˆ (k−1) ) − L(α ˆ (k−1) ) ≥ (x(k−1) − α ˆ (k−1) ) by the convexity ˆ (k) , λ ˆ (k) )T Lx (α ˆ (k) , λ Since L(x(k−1) , λ of L in x, we further obtain

 √ 1 ˆ ,λ ˆ (k) k2 . (A.9) L(x ,λ ) − L(α )≥ − (GF¯ + Dλ P Gg ) kx(k−1) − α ρ1 ˆ (k−1) −ρ2 PN gi (x(k−1) )k2 . On the other hand, by (42b), we know that βˆ(k) = arg minβ∈D kβ−λ i i=1 (k−1)

ˆ (k−1)

(k)



ˆ (k−1)

By the optimality condition and the linearity of L in λ, we have L(x

(k−1)

ˆ(k)



) − L(x

(k−1)

ˆ (k−1)



ˆ (k−1)

) = −(λ ≥

ˆ(k) T

−β

)

N X

(k−1) gi (xi )

i=1

1 ˆ (k−1) ˆ(k) 2 kλ −β k . ρ2

! (A.10)

Combining (A.9) and (A.10) yields (48). ˆ (k−1) ) → 0 and (x(k−1) , λ ˆ (k−1) ) converges to some ˆ (k) , λ Suppose that L(x(k−1) , βˆ(k) ) − L(α √ ˆ ⋆ ) as k → ∞. Since ρ1 ≤ 1/(GF¯ + Dλ P Gg ), we infer from (48) that ˆ ⋆, λ limit point (x ˆ (k−1) − βˆ(k) k → 0, as k → ∞. It then follows from (42) and the ˆ (k) k → 0 and kλ kx(k−1) − α ˆ ⋆ ) ∈ X × D satisfies ˆ ⋆, λ fact that projection is a continuous mapping [49] that (x  X   M ⋆ T ⋆ ⋆ ⋆ ˆ⋆ ⋆ T ˆ i = PXi x ˆ i )∇F ˆ i − ρ1 [∇fi (x ˆ i )λ ] , i = 1, . . . , N, ˆ i ) + ∇gi (x x f i (x i=1

ˆ ⋆ = PD (λ ˆ ⋆ + ρ2 λ

N X

ˆ ⋆i )) gi (x

i=1

ˆ ⋆) and λ ˆ ⋆ = arg maxλ∈D L(x ˆ ⋆ = arg minx∈X L(x, λ ˆ ⋆ , λ) i.e., which, respectively, imply that x ˆ ⋆ ) is a saddle point of problem (15). ˆ ⋆, λ (x  A PPENDIX C P ROOF

OF

L EMMA 7

ˆ (k−1) )+(λ−λ ˆ (k−1) )T Lλ (α ˆ (k−1) ), ˆ (k) , λ ˆ (k) , λ) = L(α ˆ (k) , λ By (47) in Lemma 4 and the fact of L(α we have 1 c¯k ˆ (k−1) )T g(α ˆ (k) ) ≤ + (λ − λ 2ak 2ak DRAFT

N X j=1

(k−1) kλj

− λk2 −

N X i=1

(k) kλi

− λk2

!

,

(A.11)

November 7, 2013

27

ˆ (k) ) = where g(α

PN

i=1

(k)

ˆ i ) and gi (α

√ ˜ (k) − λ ˆ (k−1) k + 4ρ1 NDλ GF P MLg Lf ak ky˜(k) − yˆ(k−1) k. c¯k , a2k NCg2 + 2ak (2ρ1 Dλ P L2g + Cg )kλ i i i By following a similar argument as in [27, Proposition 5.1] and by (A.11), (16), (29) and (30), one can show that ˆ ⋆ )T g(x(k−1) ) ≤ c¯k + 1 (λ − λ 2ak 2ak

N X j=1



(k−1)

kλj

− λk2 −

N X i=1

(k)

kλi − λk2

!

ˆ (k−1) − λ ˆ ⋆ k. ˆ (k) k + NCg kλ + 2N P Dλ Lg kx(k−1) − α

(A.12)

By taking the weighted running average of (A.12), we obtain k 1 X ⋆ T (k−1) ˆ ⋆ )T g(x(ℓ−1) ) ˆ ˆ aℓ (λ − λ (λ − λ ) g(x )≤ Ak ℓ=1 ! N N k X 1 X 1 X (0) (k) ≤ kλi −λk2 kλj −λk2 − c¯ℓ + 2Ak 2Ak j=1 i=1 ℓ=1 √ k k NCg X 2N P Dλ Lg X ˆ (ℓ−1) − λ ˆ ⋆k ˆ (ℓ) k + aℓ kx(ℓ−1) − α aℓ kλ + Ak A k ℓ=1 ℓ=1 √ k k k 1 X 2NDλ2 2N P Dλ Lg X NCg X (ℓ−1) (ℓ) ˆ (ℓ−1) − λ ˆ ⋆k ˆ k+ ≤ c¯ℓ + aℓ kx −α aℓ kλ + 2Ak ℓ=1 Ak Ak A k ℓ=1 ℓ=1

(A.13)

, ξ (k−1) , where the first inequality is owing to the fact that g(x) is convex, and the last inequality is P (k) 2 obtained by dropping − N i=1 kλi −λk followed by applying (16). We claim that lim ξ (k−1) = 0.

k→∞

(A.14)

To see this, note that the first and second terms in ξ (k−1) converge to zero as k → ∞ since P∞ P ˆ (ℓ−1) − λ ˆ ⋆ k also converges to limk→∞ Ak = ∞ and ¯ℓ < ∞. The term A1k kℓ=1 aℓ kλ ℓ=1 c ˆ (k) − λ ˆ ⋆ k = 0 and so does its weighted running average zero since, by Lemma 6, limk→∞ kλ P ˆ (ℓ) k also converges to zero since by [44, Lemma 3]. Similarly, the term A1k kℓ=1 aℓ kx(ℓ−1) − α

ˆ (k) k = 0 due to (50). limk→∞ kx(k−1) − α (k−1) ) + ) ˆ ⋆ k + δ ≤ Dλ by (17). ˆ ⋆ + δ (g(xˆ which lies in D, since kλk ≤ kλ Now let λ = λ + (k−1) ˆ k(g(x )) k Substituting λ into (A.13) gives rise to + ˆ (k−1) ) k ≤ ξ (k−1) . δk g(x (A.15) November 7, 2013

DRAFT

28

As a result, the first term in (58) is obtained by taking k → ∞ in (A.15) and by (A.14). ˆ ⋆ + δ λˆ (k−1) ∈ D. By To show that the second limit in (58) holds true, we first let λ = λ ˆ (k−1)  kλ(k−1)k D (k−1) T (k−1) λ ˆ ˆ substituting it into (A.13) and by (16), we obtain (λ ) g(x )≤ δ ξ which, by taking k → ∞, leads to

ˆ (k−1) )T g(x ˆ (k−1) ) ≤ 0. lim sup (λ

(A.16)

k→∞

ˆ (k−1) )T g(x ˆ (k−1) ) ≤ On the other hand, by letting λ = 0 ∈ D, from (A.13) we have −(λ ˆ⋆ − λ ˆ (k−1) )T g(x ˆ (k−1) − λ ˆ ⋆ k. Since limk→∞ ξ (k−1) = 0 and ˆ (k−1) ) ≤ ξ (k−1) + NCg kλ ξ (k−1) + (λ

ˆ (k) − λ ˆ ⋆ k = 0 by Lemma 6, it follows that lim inf k→∞ (λ ˆ (k−1) )T g(x ˆ (k−1) ) ≥ 0, which limk→∞ kλ 

along with (A.16) yields the second term in (58). A PPENDIX D P ROOF

OF

L EMMA 8

ˆ (k) in (43) implies that The definition of α ˆ (k−1) + (α ˆ i )λ ˆ i − xi giT (α (k)

(k)

(k−1) T

+

(k−1)

) ∇fiT (xi

)∇F (N yˆ(k−1) )

1 (k) (k−1) 2 (k−1) ˆ (k−1) ˆ i − xi kα k ≤ giT (xi )λ , 2ρ1

which, by summing over i = 1, . . . , N, yields ˆ (k−1) + (α ˆ (k) )λ ˆ (k) − x(k−1) )T ∇F¯ (x(k−1) ) g T (α + ˆ (k) ) = where g(α

PN

i=1

1 ˆ (k−1) , ˆ (k) − x(k−1) k2 ≤ g T (x(k) )λ kα 2ρ1

(A.17)

(k)

ˆ i ). By substituting the decent lemma in [49, Lemma 2.1] giT (α

¯ α ¯ (k−1) ) + GF¯ kα ˆ (k) − x(k−1) k2 ˆ (k) ) ≤ F¯ (x(k−1) ) + (α ˆ (k) − x(k−1) )T ∇F(x F( 2

(A.18)

into (A.17), we then obtain   1 GF¯ ˆ (k−1) ) − L(α ˆ (k−1) ) ˆ (k) − x(k−1) k2 ≤ L(x(k−1) , λ ˆ (k) , λ kα − 2ρ1 2 which, after combining with (A.10), yields (60). (k)

ˆi To show the second part of this lemma, let us recall (A.3) that α

in (43) can be alternatively

written as (k)

ˆi α DRAFT

(k−1)

ˆ (k−1) + (∇f T (x = arg min giT (αi )λ i i αi ∈Xi

)∇F (N yˆ(k−1) ) +

1 (k) (k−1) T ˆ − xi (α )) αi , ρ1 i November 7, 2013

29

which implies that, for all xi ∈ Xi , we have 1 (k) (k−1) T (k) (k−1) ˆ i − xi ˆ i − xi (α )) (α ) ρ1 (k) (k−1) T (k−1) ˆ (k−1) + (∇f T (x(k−1) )∇F (N yˆ(k−1) ) + 1 (α ˆ − xi ≤ giT (xi )λ )) (xi − xi ). i i ρ1 i

ˆ (k−1) + (∇f T (x ˆ i )λ giT (α i i (k)

(k−1)

)∇F (N yˆ(k−1) ) +

By summing the above inequality over i = 1, . . . , N, one obtains, for all x ∈ X , ˆ ⋆ + ∇F¯ T (x(k−1) )(α ˆ (k) )λ ˆ (k) − x(k−1) ) + g T (α

1 ˆ (k) − x(k−1) k2 kα ρ1

ˆ ⋆ + ∇F¯ T (x(k−1) )(x − x(k−1) ) ≤ g T (x)λ +

N 1 X (k) (k−1) (k−1) ˆ⋆ − λ ˆ (k−1) )(g(α ˆ − xi ˆ (k) ) − g(x)) (α )(xi − xi ) + (λ ρ1 i=1 i

N 2Dx X (k) (k−1) ⋆ (k−1) ˆ ˆ⋆ − λ ˆ (k−1) k, ¯ ¯ ˆ i − xi ≤ g (x)λ + F (x) − F (x )+ kα k + 2Cg kλ ρ1 i=1 T

¯ boundedness of Xi and the constraint functions where we have utilized the convexity of F, (cf. Assumption 1 and (30)) in obtaining the last inequality. By applying (A.18) to the above inequality and by the premise of 1/ρ1 ≥ GF¯ > GF¯ /2, we further obtain, for all x ∈ X , N X (k) (k−1) ˆ ⋆) ≤ L(x, λ ˆ ⋆ ) + 2Dx ˆ i − xi ˆ ⋆, λ kα k L(x ρ1 i=1

ˆ⋆ − λ ˆ (k−1) k + |L(x ˆ ⋆ ) − L(α ˆ ⋆ )|, ˆ ⋆, λ ˆ (k) , λ + 2Cg kλ

(A.19)

in which one can bound the last term, using (34), (27), (29) and (16), by √ ˆ ⋆ ) − L(α ˆ ⋆ )| ≤ (LF¯ + NDλ P Lg )kx ˆ ⋆, λ ˆ (k) , λ ˆ⋆ − α ˆ (k) k. |L(x

(A.20)

ˆ (k−1) ) → 0 and (x(k−1) , λ ˆ (k−1) ) converges to some ˆ (k) , λ Suppose that L(x(k−1) , βˆ(k) ) − L(α ˆ (k−1) ) − ˆ ⋆ ) as k → ∞. Then, by (60) and since 1/ρ1 ≥ GF¯ , we have k(x(k−1) , λ ˆ ⋆, λ limit point (x ˆ (k) , βˆ(k) )k → 0, as k → ∞. Therefore, (α lim

k→∞

N 2Dx X (k) (k−1) ˆ⋆ − λ ˆ (k−1) k + |L(α ˆ ⋆ ) − L(x ˆ ⋆ )| ˆ (k) , λ ˆ⋆, λ ˆ i − xi k + 2Cg kλ kα ρ1 i=1

!

= 0.

ˆ ⋆ ) ≤ L(x, λ ˆ ⋆ ) for all ˆ ⋆, λ Thus, it follows from (A.19), (A.20) and the above equation that L(x x ∈ X . The rest of the proof is similar to that of Lemma 5. November 7, 2013



DRAFT

30

R EFERENCES [1] V. Lesser, C. Ortiz, and M. Tambe, Distributed Sensor Networks: A Multiagent Perspective. Kluwer Academic Publishers, 2003. [2] M. Rabbat and R. Nowak, “Distributed optimization in sensor networks,” in Proc. ACM IPSN, Berkeley, CA, USA, April 26-27, 2004, pp. 20–27. [3] M. Chiang, P. Hande, T. Lan, and W. C. Tan, “Power control in wireless cellular networks,” Foundations and Trends in Networking, vol. 2, no. 4, pp. 381–533, 2008. [4] S. Chao, T.-H. Chang, K.-Y. Wang, Z. Qiu, and C.-Y. Chi, “Distributed robust multicell coordianted beamforming with imperfect csi: An ADMM approach,” IEEE Trans. Signal Processing, vol. 60, no. 6, pp. 2988–3003, 2012. [5] D. Belomestny, A. Kolodko, and J. Schoenmakers, “Regression methods for stochastic control problems and their convergence analysis,” SIAM J. on Control and Optimization, vol. 48, no. 5, pp. 3562–3588, 2010. [6] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, NY, USA: Springer-Verlag, 2001. [7] M. Elad, Sparse and Redundant Rerpesentations.

New York, NY, USA: Springer Science + Business Media, 2010.

[8] R. Cavalcante, I. Yamada, and B. Mulgrew, “An adaptive projected subgradient approach to learning in diffusion networks,” IEEE Trans. Signal Processing, vol. 57, no. 7, pp. 2762–2774, Aug. 2009. [9] B. Johansson, T. Keviczky, M. Johansson, and K. Johansson, “Subgradient methods and consensus algorithms for solving convex optimization problems,” in Proc. IEEE CDC, Cancun, Mexica, Dec. 9-11, 2008, pp. 4185–4190. [10] A. Nedi´c, A. Ozdaglar, , and A. Parrilo, “Distributed subgradient methods for multi-agent optimization,” IEEE Trans. Automatic Control, vol. 54, no. 1, pp. 48–61, Jan. 2009. [11] ——, “Constrained consensus and optimization in multi-agent networks,” IEEE Trans. Automatic Control, vol. 55, no. 4, pp. 922–938, April 2010. [12] I. Lobel and A. Ozdaglar, “Distributed subgradient methods for convex optimization over random networks,” IEEE Trans. Automatic Control, vol. 56, no. 6, pp. 1291–1306, June 2011. [13] S. S. Ram, A. Nedi´c, and V. V. Veeravalli, “Distributed stochastic subgradeint projection algorithms for convex optimization,” J. Optim. Theory Appl., vol. 147, pp. 516–545, 2010. [14] J. Chen and A. H. Sayed, “Diffusion adaption strategies for distributed optimization and learning networks,” IEEE. Trans. Signal Process., vol. 60, no. 8, pp. 4289–4305, Aug. 2012. [15] M. Zhu and S. Mart´ınez, “On distributed convex optimization under inequality and equality constraints,” IEEE Trans. Automatic Control, vol. 57, no. 1, pp. 151–164, Jan. 2012. [16] D. Yuan, S. Xu, and H. Zhao, “Distributed primal-dual subgradient method for multiagent optimization via consensus algorithms,” IEEE Trans. Systems, Man, and Cybernetics- Part B, vol. 41, no. 6, pp. 1715–1724, Dec. 2011. [17] S. S. Ram, A. Nedi´c, and V. V. Veeravalli, “A new class of distributed optimization algorithm: Application of regression of distributed data,” Optimization Methods and Software, vol. 27, no. 1, pp. 71–88, 2012. [18] T.-H. Chang, M. Alizadeh, and A. Scaglione, “Coordinated home energy management for real-time power balancing,” in Proc. IEEE PES General Meeting, San Diego, CA, July 22-26, 2012, pp. 1–8. [19] N. Li, L. Chen, and S. H. Low, “Optimal demand response based on utility maximization in power networks,” in IEEE PES General Meeting, Detroit, MI, USA, July 24-29, 2011, pp. 1–8. [20] J. C. V. Quintero, “Decentralized control techniques applied to electric power distributed generation in microgrids,” MS DRAFT

November 7, 2013

31

thesis, Departament d’Enginyeria de Sistemes, Autom´atica i Inform´atica Industrial, Universitat Polit´ecnica de Catalunya, 2009. [21] M. Kallio and A. Ruszczy´nski, “Perturbation methods for saddle point computation,” 1994, report No. WP-94- 38, International Institute for Applied Systems Analysis. [22] M. Kallio and C. H. Rosa, “Large-scale convex optimization via saddle-point computation,” Oper. Res., pp. 93–101, 1999. [23] A. Olshevsky and J. N. Tsitsiklis, “Convergence rates in distributed consensus averaging,” in Proc. IEEE CDC, San Diego, CA, USA, Dec. 13-15, 2006, pp. 3387–3392. [24] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” Systems and Control Letters, vol. 53, pp. 65–78, 2004. [25] B. Yang and M. Johansson, “Distributed optimization and games: A tutorial overview,” Chapter 4 of Networked Control Systems, A. Bemporad, M. Heemels and M. Johansson (eds.), LNCIS 406, Springer-Verlag, 2010. [26] H. Uzawa, “Iterative methods in concave programming,” 1958, in Arrow, K., Hurwicz, L., Uzawa, H. (eds.) Studies in Linear and Nonlinear Programming, pp. 154-165. Stanford University Press, Stanford. [27] A. Nedi´c and A. Ozdaglar, “Subgradeint methods for saddle-point problems,” J. OPtim. Theory Appl., vol. 142, pp. 205–228, 2009. [28] K. Srivastava, A. Nedi´c, and D. Stipanovi´c, “Distributed Bregman-distance algorithms for min-max optimization,” in book Agent-Based Optimization, I. Czarnowski, P. Jedrzejowicz and J. Kacprzyk (Eds.), Springer Studies in Computational Intelligence (SCI), 2012. [29] M. Alizadeh, X. Li, Z. Wang, A. Scaglione, and R. Melton, “Demand side management in the smart grid: Information processing for the power switch,” IEEE Signal Process. Mag., vol. 59, no. 5, pp. 55–67, Sept. 2012. [30] X. Guan, Z. Xu, and Q.-S. Jia, “Energy-efficient buildings facilitated by microgrid,” IEEE Trans. Smart Grid, vol. 1, no. 3, pp. 243–252, Dec. 2010. [31] N. Cai and J. Mitra, “A decentralized control architecture for a microgrid with power electronic interfaces,” in Proc. North American Power Symposium (NAPS), Sept. 26-28 2010, pp. 1–8. [32] D. Hershberger and H. Kargupta, “Distributed multivariate regression using wavelet based collective data mining,” J. Parallel Distrib. Comput., vol. 61, no. 3, pp. 372–400, March 2001. [33] H. Kargupta, B.-H. Park, D. Hershberger, and E. Johnson, “Collective data mining: A new perspective toward distributed data mining,” in Advances in Distributed Data Mining, H. Kargupta and P. Chan (eds.), AAAI/MIT Press, 1999. [34] D. P. Bertsekas, Network Optimization : Continuous and Discrete Models.

Athena Scientific, 1998.

[35] R. Madan and S. Lall, “Distributed algorithms for maximum lifetime routing in wireless sensor networks,” IEEE. Trans. Wireless Commun., vol. 5, no. 8, pp. 2185–2193, Aug. 2006. [36] T.-H. Chang, A. Nedich, and A. Scaglione, “Distributed sparse regression by consensus-based primal-dual perturbation optimization,” accepted by IEEE Global Conf. on Signal and Info. Process. (GlobalSIP), Austin, Texas, Dec. 3-5, 2013. [37] S. Boyd and L. Vandenberghe, Convex Optimization.

Cambridge, UK: Cambridge University Press, 2004.

[38] D. P. Bertsekas, A. Nedi´c, and A. E. Ozdaglar, Convex analysis and optimization.

Cambridge, Massachusetts: Athena

Scientific, 2003. [39] I. Y. Zabotin, “A subgradient method for finding a saddle point of a convex-concave function,” Issled. Prikl. Mat., vol. 15, pp. 6–12, 1988. [40] Y. Nesterov, “Smooth minimization of nonsmooth functions,” Math. Program., vol. 103, no. 1, pp. 127–152, 2005. November 7, 2013

DRAFT

32

[41] G. Mateos, J. A. Bazerque, and G. B. Giannakis, “Distributed sparse linear regression,” IEEE Trans. Signal Processing, vol. 58, no. 10, pp. 5262–5276, Dec. 2010. [42] J. F. C. Mota, J. M. F. Xavier, P. M. Q. Aguiar, and M. Puschel, “Distributed basis pursuit,” IEEE. Trans. Signal Process., vol. 60, no. 4, pp. 1942–1956, April 2012. [43] A. Nedi´c and A. Ozdaglar, “Approximate primal solutions and rate analysis for dual subgradient methods,” SIAM J. OPtim., vol. 19, no. 4, pp. 1757–1780, 2009. [44] T. Larsson, . Patriksson, and A.-B. Str¨omberg, “Ergodic, primal convergence in dual subgradient schemes for convex programming,” Math. Program., vol. 86, pp. 238–312, 1999. [45] B. T. Polyak, Introduction to Optimization.

New Yoir: Optimization Software Inc., 1987.

[46] T.-H. Chang, A. Nedich, and A. Scaglione, “Electronic companion for Distributed constrained optimization by consensusbased primal-dual perturbation method,” available on http://arxiv.org. [47] T.-H. Chang, M. Alizadeh, and A. Scaglione, “Real-time power balancing via decentralized coordinated home energy scheduling,” IEEE Trans. Smart Grid, vol. 4, no. 3, pp. 1490–1504, Sept. 2013. [48] I. J. Lustig and a. D. F. S. R. E. Marsten, “Interior point methods for linear programming: Computational state of the art,” ORSA J. Comput., vol. 6, no. 1, pp. 1–14, 1994. [49] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and distributed computation: Numerical methods.

Upper Saddle River, NJ,

USA: Prentice-Hall, Inc., 1989.

DRAFT

November 7, 2013

Suggest Documents