2015 IEEE 54th Annual Conference on Decision and Control (CDC) December 15-18, 2015. Osaka, Japan
Optimal Incentive Design for Distributed Stabilizing Control of Nonlinear Dynamic Networks Hunmin Kim and Minghui Zhu Abstract— In many dynamic networks, control authorities seek for heterogeneous and even partially conflicting subobjectives. The misaligned interests threaten reliability of dynamic networks and degrade their operating performance. In this paper, we focus on design and analysis of mechanisms where a system operator provides a reward to incentivize control authorities to implement distributed stabilizing controllers. The proposed mechanisms are based on a bidding scheme where the bid of each control authority is represented by a newly derived local stability index. The mechanism induces a non-cooperative game between local control authorities while minimizing disclosures of their private information. By perturbing the game, we design an optimal incentive mechanism where the system operator can ensure network-wide stability and simultaneously maximize a social welfare while minimizing the reward size and perturbations.
I. I NTRODUCTION Multi-agent systems are characterized by intricate couplings between heterogenous components of control, communication, computation and human. In a number of applications; e.g., smart grid and intelligent transportation systems, agents are selfish and seek for heterogenous or even conflicting subobjectives. To mitigate the issue, a common practice is incentive or mechanism design which modifies agents’ preferences via side payments such that individual interests are aligned with social preferences. The most well-known static mechanisms is Vickrey-Clarke-Groves (VCG) auction [3], [6], [22]. Moreover, incentive design has been extended to dynamic scenarios and is compromised of two classes: directed and indirected. Algorithmic mechanism design is directed where agents are incentivized to follow algorithms specified by the system operator to solve computation or control problems [14]–[16], [21]. Incentive control; e.g., in [1], [7], is indirected where agents’ choices are influenced by rewards or prices such that a social welfare can be optimized. A common feature of current mechanism design is to formulate the problems of interest as leader-follower problems. The dynamic systems of agents, the followers, are embedded in the problems of incentive designers, the leaders. This requires selfish agents to report lots of (potentially private) information to incentive designers. Distributed stabilizing control has been studied extensively; e.g., in [19]. Survey paper [18] well organizes a wide range of multi-agent control methods in power, urban traffic, communication, manufacturing, etc. However, incentive design for distributed stabilizing control has been rarely studied. Hunmin Kim and M. Zhu are with the Department of Electrical Engineering, Pennsylvania State University, 201 Old Main, University Park, PA, 16802,
[email protected],
[email protected].
978-1-4799-7885-4/15/$31.00 ©2015 IEEE
Contributions. In this paper, we study optimal incentive design for distributed stabilizing control of a class of nonlinear dynamic networks. To minimize the disclosure of private information of control authorities, our approach consists of two steps. The first step is to derive a set of new distributed scalar stability indices. The set of indices allows the system operator to ensure the network-wide stability without accessing to local information beyond the indices. We determine a class of systems whose stability indices can be rendered arbitrarily small. The second step is to incentivize the control authorities to choose small enough scalar stability indices. Receiving rewards corresponding to a Nash equilibrium, the control authorities synthesize distributed controllers to commit to their proposals. We identify the optimal reward size and perturbations to maximize a social welfare. Notations. Denote the supremum norm of the truncation of u(t) in [t1 , t2 ] by kuk[t1 ,t2 ] , supt1 ≤t≤t2 ku(t)k. Define |S| as a cardinality of set S. Define the followings forPsimplicity: ki,1→p , {ki,1 , · · · , ki,p }, xS , {xj }j∈S , a ¯ , ∀i ai , and s−i , s\si . Vector ~0n indicates a n dimensional zero vector. A directed graph G = (V, E) is a graph where each edge has an associated direction. Of directed graphs, a graph without directed cycles is called acyclic. Definition 1.1: Topological ordering O = {i1 , i2 , · · · , in }o of a directed acyclic graph G is a linear ordering of its edges. In the ordering, node j comes before node k when {j, k} ∈ E. Linear ordering indicates that antisymmetry, transitivity and totality are satisfied under the operator ≤o where ia ≤o ib if a ≤ b. Indices a and b represent the ath and bth element in the ordering respectively. Moreover, directed graph G is acyclic if and only if G has a topological ordering (pp. 598599 in [17]). II. P ROBLEM FORMULATION In Section II-A, we present the model of dynamic networks considered in this paper. The problem and our twostep solution are explained in Section II-B. A. Network model The physical network is described by a directed graph G = (V, E) where V , {1, · · · , N } denotes the set of nodes in the network and E ⊆ V × V denotes the set of edges. Set Ni = {j1 , · · · , jmi } denotes the time invariant set of neighboring nodes of i ∈ V; i.e. Ni , {j ∈ V \ {i}|{j, i} ∈ E}. Each node i ∈ V belongs to a local control authority and
2289
is associated with a single input dynamic system as follows:
III. S TEP 1: D ISTRIBUTED S TABILIZING C ONTROLLER D ESIGN
z˙i = hi (zi , xi , xNi , t) X gij (xj , t) x˙ i = Ai xi + Bi ui +
In this section, we establish a general framework for designing distributed controllers which globally asymptotically stabilize the networked system (1). In particular, the induced stability condition is determined by a set of local stability indices. To find such indices, the preliminary coordinate transformation is conducted in Section III-A. Then, we formally analyze the stability and specify the sufficient conditions of it as a form of the local stability index in Section III-B. In the same section, we determine a class of system where the stability indices can be sufficiently small.
(1)
j∈Ni
where zi ∈ Rnzi , xi = [xi,1 , · · · , xi,ni ]T ∈ Rni , xj ∈ Rnj , ui ∈ R, Ai ∈ Rni ×ni , Bi , [0, · · · , 0, 1]T ∈ Rni and Ai ,
0 0 .. .
1 0 .. .
0 1 .. .
··· ··· .. .
0 0 .. .
0 −αi,1
0 −αi,2
0 −αi,3
··· ···
1 −αi,ni
.
A. Coordinate Transformation
Let gij (xj , t) = [gij,1 (xj , t), · · · , gij,ni (xj , t)]T denote time-varying nonlinear functions and x , [xT1 , · · · , xTN ]T T T and z , [z1T , · · · , zN ] as the subsystem states x and z. Multi-machine power systems are examples of system (1). Assumption 2.1: For ∀i ∈ V, ∀j ∈ Ni , ∀l ∈ {1, · · · , ni }, function gij,l (·, ·) is uniformly globally Lipschitz; i.e., there exists a constant Lij,l > 0 such that kgij,l (y, t) − gij,l (z, t)k ≤ Lij,l ky − zk for ∀y, z ∈ Rnj and ∀t ≥ t0 .
Consider the system state xi in (1). Intuitively speaking, if each xi,l+1 acts as a virtual controller to stabilize xi,l for ∀l ∈ {1, · · · , ni − 1} and ui stabilizes xi,ni , then the system state xi can be rendered to be ISS with respect to neighboring state xj . Moreover, to apply distributed constraint small gain theorem [23], we expect ISS gains to be contraction mappings. To achieve this, consider the following coordinate transformation inspired by backstepping (p. 589 in [8]): zˆi = zi ,
x ˆ i = Pi x i
(2)
ni ×ni
Assumption 2.2: Subsystem z˙i = hi (zi , xi , xNi , t) in (1) is input-to-state stable (ISS) [20] with respect to xi and xNi . For notational simplicity, we suppress the time t dependence in function gij (·, ·). Functions gij (xj , t) and gij,l (xj , t) are expressed as gij (xj ) and gij,l (xj ) respectively. B. Problem Formulation A fundamental objective is to ensure the stability of (1). However, the objective is challenging when control authorities are selfish. In particular, control authorities may choose local controllers according to their own preferences. Without coordination, the aggregation of such local controllers could fail to ensure the network-wide stability. As a result, the system operator wants to incentivize self-interested control authorities such that distributed stabilizing controllers are adopted. This introduces the problem of optimal incentive design for distributed stabilizing control. To solve this problem, we propose a two-step solution. In the first step, a new distributed scalar stability index is derived. The set of the indices alone enables the system operator to ensure the network-wide stability. A class of system is identified where the stability indices can be made arbitrarily small. The second step is to stimulate the control authorities with rewards to choose small enough scalar stability indices. This mechanism induces a non-cooperative game among the control authorities. The control authorities receive rewards according to Nash equilibrium and design distributed stabilizing controllers to commit to their proposals. Optimal perturbations and reward size are identified where the networked system is stable and a social welfare is maximized.
where Pi ∈ R is a lower triangular matrix described by ˆ fi,1,1 0 ··· 0 .. .. .. . . . ˆ ˆ Pi , fi,l,1 · · · fi,l,l 0 · · · 0 . . .. .. . . . fˆi,ni ,1 ··· fˆi,ni ,ni Pl−1 where fˆi,l,b , n=b fˆi,n,b fi,l,n for ∀l ∈ {1, · · · , ni }, ∀b ∈ {1, · · · , l − 1} and fˆi,l,b , 1 if l = b. Here, fi,l,b , fi,l−1,b−1 − ki,b fi,l−1,b for ∀b ∈ {1, · · · , l − 2}, fi,l,l−1 , fi,l−1,l−2 + ki,l−1 , and fi,l,0 , 0 for ∀l ∈ {2, · · · , ni }. Each ki,l > 0 is a positive constant. Choose the input ui as ui =
ni X
αi,l xi,l −
l=1
ni X
fˆi,ni ,l−1 xi,l − ki,ni x ˆi,ni .
(3)
l=2
With the coordinate transformation (2) and input (3), system (1) becomes X x ˆ˙ i,l = −ki,l x ˆi,l + x ˆi,l+1 + gˆij,l (ˆ xj , ki,1→l−1 , kj,1→nj ) j∈Ni
x ˆ˙ i,ni = −ki,ni x ˆi,ni +
X
gˆij,ni (ˆ xj , ki,1→ni −1 , kj,1→nj )
j∈Ni
ˆ i (ˆ zˆ˙i = h zi , x ˆi , x ˆNi , t)
(4)
ˆ i (ˆ for ∀l ∈ {1, · · · , ni − 1} where h zi , x ˆi , x ˆNi , t) , hi (zi , xi , xNi , t), gˆij,l (ˆ xj , ki,1→l−1 , kj,1→nj ) , gij,l (xj ) + Pl−1 b=1 fi,l,b (ki,1→b )gij,b (xj ) for ∀l ∈ {1, · · · , ni }. Define the subsystem states x ˆ , [ˆ xT1 , · · · , x ˆTN ]T and zˆ , T T T [ˆ z1 , · · · , zˆN ] . A new Lipschitz constant for gˆij,l (ˆ xj , ·) ˆ ij,l (ki,1→l−1 , kj,1→n ) , kP −1 k(Lij,l + is defined by L j j
2290
Pl−1
|fi,l,b (ki,1→b )|Lij,b ). where Pi−1 is well defined since all the diagonal elements of triangular matrix Pi are 1. b=1
B. Stability Analysis Linear positive control gain ki,l can be tuned arbitrarily large so that ISS gains of x ˆi,l are contraction mappings with respect to neighboring states. However, the choice of control gain ki,l depends on some other control gains kj,q . This dependency is referred to as parent-child relation. To clarify this relation, we introduce some notations below. Denote Gk = (Vk , Ek ) as a directed k graph where Vk is the set of nodes and Ek ⊆ Vk × Vk is the set of edges. All the pairs (i, l) are elements of Vk ; i.e., node (i, l) ∈ Vk for ∀i ∈ V, ∀l ∈ {1, · · · , ni }. Directed k graph Gk can be constructed from (4). If gˆij,l is dependent on kj,q , then {(j, q), (i, l)} ∈ Ek where (j, q), (i, l) ∈ Vk . In directed k graph Gk , we define node (i, l) as a child of (j, q) if {(j, q), (i, l)} ∈ Ek and we define the set of child nodes of (j, q) as Childj,q , {(i, l) ∈ Vk |{(j, q), (i, l)} ∈ Ek }. Likewise, node (j, q) is said to be a parent of (i, l) if {(j, q), (i, l)} ∈ Ek and we define the set of parent nodes of (i, l) as P arenti,l = {(j, q) ∈ Vk |{(j, q), (i, l)} ∈ Ek }. In these sets, subscript i or l could be identical to j or q respectively; e.g., i = j or l = q. However, both subscripts cannot be identical at the same time; i.e., there is no self loops. Moreover, node (i, l) is a child of (j, q) if and only if (j, q) is a parent of (i, l). Every pair of parent(j, q)-child(i, l) is an element of Ek . It will be clear that the meaning of parent(j, q)-child(i, l) relation is that selecting gain ki,l must be dependent on kj,q to ensure that ISS gains of x ˆi,l are contraction mapping. Theorem 3.1: The following statements hold with any set of positive scalars 0 < δi,l < 1 and 0 < δij,l P< 1 for ∀i ∈ V, ∀l ∈ {1, · · · , ni } such that 0 < 1 − δi,l − j∈Ni δij,l < 1. (P1) Under Assumption 2.1, with the controller (3), the subsystem x ˆi of (4) satisfies the following ISS inequalities: kˆ xi,l (t)k ≤ max{(mi + 2) × e−ki,l (1−δi,l −
P
j∈Ni
δij,l )(t−t0 )
,
γi,l kˆ xi,l+1 k[t0 ,t] , γij1 ,l kˆ xj1 k[t0 ,t] , · · · , γijmi ,l kˆ xjmi k[t0 ,t] }
(P2)
(P3)
for ∀l ∈ {1, · · · , ni } where ISS gains γi,l and γij,l ˆ ij,l (mi +2)L are given by γi,l = kmi,li +2 δi,l , γij,l = ki,l δij,l . If directed k graph Gk is acyclic, then given any set γV∗ of stability indices where γi∗ > 0 for ∀i ∈ V, there is a set of k: ˆ ij,l mi + 2 (mi + 2)L ki,l > max{ , } (5) ∗ j∈Ni δij,l γi δi,l γi∗ for ∀i ∈ V and ∀l ∈ {1, · · · , ni − 1} such that maxj∈Ni ,l∈{1,··· ,ni } {γi,l , γij,l } < γi∗ for ∀i ∈ V. Under Assumption 2.1 and 2.2, with the controller (3) and control gains (5), if 0 < γi∗ ≤ 1 for ∀i ∈ V, then the networked system is globally asymptotically stable.
Proof: It has been omitted due to space limitations. By (P2), given any set γV∗ > 0 of stability indices, each control authority can choose its control gains ki,l such that maxj∈Ni ,l∈{1,··· ,ni } {γi,l , γij,l } < γi∗ under the assumption that Gk is acyclic. This fact will be used in Section IV. Finding topological ordering Ok from a given directed acyclic graph Gk takes time O(|Vk |+|Ek |) through depth first search method [10]. Though this is not too complex, still it requires three steps: transforming coordination, building a k graph, and finding a topological ordering. It is worth to study structural conditions which ensure directed acyclic graph. Corollary 3.1: If ∀gij,l depends on xj,1→min{l,nj } , then Gk = (Vk , Ek ) has a topological ordering Ok . Proof: It has been omitted due to space limitations. IV. S TEP 2: O PTIMAL I NCENTIVE D ESIGN In this section, assuming that directed k graph Gk is acyclic, we propose an optimal reward based mechanism which encourages control authorities to enforce the stability condition derived in Theorem 3.1: max
j∈Ni ,l∈{1,··· ,ni }
{γi,l , γij,l } < γi∗ , 0 < γi∗ ≤ 1, ∀i ∈ V. (6)
In Section IV-A, we illustrate a payoff model and decision making schemes of control authorities and system operator as well. The payoff model represents a non-corporative game among control authorities and we use Nash equilibrium as a solution notation. The properties of Nash equilibrium are analyzed in Section IV-B. Based on the properties, we study optimality where the networked system is stable and a social welfare is maximized in Section IV-C. Optimality is extended in Section IV-D where the system operator achieves optimality via minimal perturbations and reward. A. Incentive mechanism We propose a reward based scheme built on raffle schemes in [11] to address the problem of incentive design. 1) Payoff model: The system operator adopts bid si ≥ 0 as a contribution made by control authority i to system stability. The control authority receives rewards by submitting a non-negative bid si , but should fulfill a local stability index γi∗ ≤ s1i , or γi∗ < ∞ if si = 0. To satisfy the stability condition (6), the system operator desires to decrease stability indices γi∗ and correspondingly increase bids si . To stimulate contributions, the system operator provides a fixed amount of reward R > 0. Consider the payoff function of control authority i: si −ci s − R) − βi si for s¯ ≥ R s¯−¯ c R + hi (¯ Ui (s) , (7) 0 otherwise where s , {si }i∈V . We refer to (7) as unperturbed payoff function when ci = 0, and βi = 1 for ∀i ∈ V. Otherwise, it is referred to as perturbed payoff function. Small perturbations βi by the control authority and ci by the system operator are added to unperturbed incentive model. Remark 4.1: Incentive scheme with unperturbed payoff function is studied in [11]. The papers [5], [12] deal with
2291
a model with βi 6= 1 to finance public goods. Piece-rate incentive schemes [4] and rank-order tournaments [9] choose yi = si − ci as an output where si is an effort made by the agent i and the payoff is a function of the output yi in the both games. In the former, ci is a known constant, but is an unknown random value in the latter. The two schemes are compared thorough an experimental way in [2]. Before playing the game, each control authority i chooses a function hi : R → R which stands for the interest in network-wide stability and a parameter βi such that 0 < βi ≤ 1 which represents the cost incurred by choosing a high control gain ki,l for ∀l. Note that a larger si demands a set of larger ki,l they need to choose because ki,l ∝ γ1∗ ∝ si i by (5). The system operator chooses a reward R ≥ 0 and perturbations c ≥ 0 where c , {ci }i∈V . The reward R ≥ 0 is publicized to all the control authorities but the perturbation ci is informed only to the control authority i. Now, each control authority plays the game via submitting a bid si ≥ 0 −ci and then receives payoff Ui . If the reward term ss¯i −¯ c R is i negative, control authority i is assumed to pay a fine ss¯i −c −¯ c R to the system operator. We define several notations and make assumptions on the payoff parameters. The value GU is defined as the solution P (GU ) of i∈V dhidv = β¯ + 1 − N . The value RL is chosen (GU ) RL such that RL +GU −¯c > maxi∈V {βi − dhidv }. We define P dhi (G∗ ) ∗ G as the solution of i∈V dv = 1. Define a control authority i who submits non-zero bid si > 0 as an active control authority and define Va , {i ∈ V|si > 0} as the set of all the active control authorities. The function Li (R, c) is (GU ) defined by Li (R, c) , ci + R R+GRU −¯c + dhidv − βi . Assumption 4.1: The function hi (·) and the parameters βi , and ci for ∀i ∈ V are chosen such that (A1) hi (·) is twicely differentiable, strictly increasing, P i (0) strictly concave, hi (0) = 0, and i∈V dhdv > 1; (A2) N − 1 < β¯ ≤ N , and 0 < βi ≤ 1; (A3) c¯ ≤ min{GU , R} and ci ≥ 0 for ∀i ∈ V. Assumptions (A1) and (A2) are enforced by the control authorities and (A3) is enforced by the system operator. (A1) ensures nonzero social optimum and the existence of GU . (A2) and (A3) restrict the amount of perturbations. Under (A1) and (A2), the values GU , RL and G∗ always exist. 2) Low level decision making - Nash equilibrium: We assume each control authority aims to selfishly maximize its own payoff function. This induces a non-cooperative game among control authorities. We will use Nash equilibrium [13] as the solution notion of the game. Definition 4.1: The joint decision s∗ is a Nash equilibrium if Ui (si , s∗−i ) ≤ Ui (s∗ ) for ∀si ≥ 0, ∀i ∈ V. We denote s∗ (R, c) as Nash equilibrium where the dependency of Nash equilibrium on R and c is highlighted. After computing a Nash equilibrium s∗ (R, c), each control authority i commits to s∗i (R, c) by adopting a distributed 1 controller ui such that γi∗ ≤ s∗ (R,c) , or γi∗ < ∞ if i ∗ 1 si (R, c) = 0 . The reward based incentive mechanism is summarized as Algorithm 1, assuming that hi (·) and βi are already chosen.
Algorithm 1 Reward based incentive mechanism The system operator chooses R and c; The control authorities collectively determine a Nash equilibrium s∗ (R, c); 3: By following the steps in Algorithm 2 in Appendix I, each control authority i chooses control gains ki,l for ∀l 1 or γi∗ < ∞, if s∗i (R, c) = 0; with γi∗ ≤ s∗ (R,c) i 4: Each control authority implements a distributed controller ui in (3). 1: 2:
We assume that the control authorities are trusted to faithfully implement distributed controllers with γi∗ decided from a Nash equilibrium. 3) High level decision making - Social optimum: The network-wide stability requires γi∗ ≤ 1 and equivalently s∗i (R, c) ≥ 1. The system operator imposes smin ≥ 1 as the minimum contribution made by the control authorities, and aims to maximize the aggregate of unperturbed utilities: X max hi (G(R, c)) − G(R, c) R,c
s.t.
i∈V s∗i (R, c)
≥ smin , c¯ ≤ min{GU , R}
R ≥ 0, ci ≥ 0
∀i ∈ V
(8)
where public good is defined by G(R, c) , s¯∗ (R, c) − R. Note that s∗i (R, c) ≥ smin in problem (8) is a new constraint with respect to other problems of incentive games. The objective function (8) is an approximation of the aggregate of the payoff functions. As |V| increases, the objective function approximates well the exact aggregates because N − 1 < β¯ ≤ N . If βi = 1 for ∀i ∈ V, the objective function of (8) is the exact aggregates of payoff functions. In problem (8), s∗i (R, c) is potentially non-concave in R and c. So do the objective function and constraint. However, the objective function of problem (8) is concave with respect to G(R, c). The value G∗ maximizes the objective function if there exists a pair (R, c) such that G(R, c) = G∗ and the constraints are satisfied. This condition can be derived by taking derivative of the concave objective function (8) with respect to G(R, c) and set it equal to zero. If this is the case, such pair (R, c) is a solution of problem (8). B. Analysis of Nash equilibrium In this section, we first study the properties of Nash equilibrium given hi (·), βi , ci , and reward R under Assumption 4.1. Through the properties, we check whether there exists a solution of problem (8) when perturbation is not allowed. In Nash equilibrium, derivative of the payoff function (7) ∗ ≤ 0 for ∀i ∈ V. This with respect to si must be dUi (sds(R,c)) i is called the first order condition. If control authority i is active, s∗i (R, c) > 0, then equality holds. To prove the first ∗ order condition by contradiction, assume dUi (sds(R,c)) > 0. i 1 Theorem 3.1 shows that this can always be satisfied by choosing sufficiently large k under the assumption that directed graph Gk is acyclic.
2292
Then, there is a sufficiently small δ > 0 such that Ui (s∗i + δ, s∗−i ) ≥ Ui (s∗ ). This leads to a contradiction to the Nash equilibrium definition. The remaining part can be proven in a similar way. Theorem 4.1: Under Assumption 4.1, the following properties hold at Nash equilibrium. (P4) Given any R ≥ 0, ci ≥ 0, and 0 < βi ≤ 1 for ∀i ∈ V, there is a unique Nash equilibrium s∗ (R, c); (P5) c¯ ≤ G(R, c) ≤ GU ; (P6) If R ≥ RL , then s∗i (R, c) ≥ Li (R, c) > 0 and Li (R, c) is a strictly increasing in R without bound; (P7) Price of anarchy is P characterized by (|Va (R,c)|−1)(G(R,c)−¯ c) + β + i i∈Va (R,c) R+G(R,c)−¯ c P
1 − P
|Va (R, c)| dhi (G(R,c)) ≤ i∈V dv
ci + (R+G(R,c)−¯c)2 R ≤ (N −1)(G(R,c)−¯ c) ¯ + β + 1 − N; R+G(R,c)−¯ c i∈V\Va (R,c)
With the extended domain of R ∈ [0, ∞], R = ∞ is a unique solution of the constrained problem (8) without perturbation. MoreP dh (G(R,~ 0 )) over, limR→∞ i∈V i dv V = 1. Proof: It has been omitted due to space limitations. As seen in (P8), R = ∞ is a unique solution of problem (8) when perturbation is not allowed. Therefore, the system operator cannot induce G(R, ~0 ) = G∗ without perturbation. Properties (P5) and (P7), however, suggest that there might exist a finite reward R with perturbations. In (P5), if c¯ = GU = G∗ , then G(R, c) = G∗ . In (P7), when P dhi (G(R,c)) all the control authorities are active, = i∈V dv (N −1)(G(R,c)−¯ c) ¯ + 1 − N and thus we solve + β R+G(R,c)−¯ c (N −1)(G(R,c)−¯ c) + β¯ + 1 − N = 1 for optimality. The R+G(R,c)−¯ c remaining difficulties are to satisfy the constraints in (8). In the next section, we will identify pairs (R, c) which are optimal solutions of (8). (P8)
C. Optimal incentive design and feasibility In this section, we discuss the optimality of problem (8) where the system operator can ensure network-wide stability and simultaneously maximize the social welfare. If G(R, c) = G∗ and all the constraints in (8) hold, then a pair (R, c) is an optimal solution of problem (8). By using this fact, Theorem 4.2 shows that the system operator can solve the problem (8) by (P9) if there are perturbations βi 6= 1 for some i ∈ V or by (P10) if βi = 1 for ∀i ∈ V. Theorem 4.2: Under Assumption 4.1, the following properties hold. (P9) If β¯ 6= N , then any pair (R, c) which satisfies c¯ ≤ ¯ ¯ G∗ (β−1) (G∗ −¯ c)(β−1) and s∗i (R, c) ≥ smin N −1 , R = N −β¯ for ∀i ∈ V is a solution of (8). (P10) If β¯ = N , then any pair (R, c) which satisfies c¯ = G∗ ≤ R, and s∗i (R, c) ≥ smin for ∀i ∈ V is a solution of (8) and such pair always exists; (N −1) N smin > (Pa) If |V| ≥ 3, G∗ ≥ N N −2 smin , and GU
Proof: It has been omitted due to space limitations. A solution pair described in (P9) does not always exist. This is because it may not be possible to satisfy the constraint ∗ ¯ s∗i (R, c) ≥ smin with a bounded reward R ≤ G N(β−1) and −β¯ ci ≤ c¯ ≤ R. (Pa) and (Pb) are motivated by the sufficient conditions of the solution on c¯ = G∗ and c¯ = 0 respectively. On the other hand, a solution pair in (P10) always exists because the feasible region of R is unbounded in this case. An optimal solution pair, however, may not be unique. This fact suggests that some solutions can be said the best in some sense. We address this question in the next section. D. Optimal incentive design with least perturbations and reward In this section, we study how to achieve G(R, c) = G∗ and satisfy all the constraints in (8) with minimum perturbations and reward. For this purpose, we introduce the problem: min R + α¯ c R,c
s.t.
R ≥ 0, ci ≥ 0,
(G ) maxi∈V {βi − dhidv }, then a pair (R, c) in (P9) exists.∗ ¯ ¯ G∗ (β−1) If G N(β−1) ( G∗ (β−1)+G U (N −β) ¯ ¯ − maxi∈V {βi − −β¯ dhi (GU ) }) dv
∀i ∈ V
(9)
where α ≥ 0 is a constant, and represents a weight on c¯. A solutions of problem (9) is also that of problem (8). This is because the feasible region of problem (9) is a subset of solution set of (8). Note that the constraints in (9) are sufficient conditions of solution of (8). Now we assume that the system operator aims to solve problem (9) which is non-convex program because s∗i (R, c) is a potentially non-concave function as we discussed before. In Theorem 4.3, we find an equivalent convex program so that the problem can be solved by well studied methods. In Theorem 4.3, set notations F(9) , F(10) , and F(11) denote the feasible sets of problems (9), (10), and (11) respectively. Likewise, values p∗(9) , p∗(10) , and p∗(11) are used to denote the optimal values of problems (9), (10), and (11) respectively. Theorem 4.3: Under Assumption 4.1, the following statements hold. (P11) If β¯ 6= N , any solution (R, c) of program (10) is an element of feasible set F(9) and the optimal value p∗(10) is an overestimate of p∗(9) .
U
(Pb)
s∗i (R, c) ≥ smin , c¯ ≤ min{GU , R}, G(R, c) = G∗
≥ smin , then a pair (R, c) in (P9) exists. 2293
G∗ (β¯ − 1) β¯ − 1 )¯ c + c N − β¯ N − β¯ ∗ ¯ G (β − 1) s.t. ci ≥ 0, 0 ≤ c¯ ≤ , N −1 ¯ c¯(N − 1) − 2(G∗ (β¯ − 1) + GU (N − β)) ci − ∗ 2 ¯ ¯ G (β − 1)(N − β) G∗ (N − 1) dhi (GU ) × c¯(N − 1)2 βi ≥ −G∗ − dv (β¯ − 1) ¯ 2 βi (N − 1)(G∗ (β¯ − 1) + GU (N − β)) + ¯2 G∗ (β¯ − 1)(N − β) + smin , ∀i ∈ V (10)
min (α −
with R =
¯ (G∗ −¯ c)(β−1) ; N −β¯
(P12) If β¯ = N , any pair (R, c) is a solution of program (11) if and only if it is a solution of (9) and the optimal values are identical: p∗(9) = p∗(11) .
Algorithm 2 Distributed Algorithm for control gain k t = 1; = 0.1; for (i, l) ∈ Vk do Dj,q = 0 for ∀(j, q) ∈ Vk ; Choose constant δi,l = δij,l = mi1+2 ; end for while t ≤ N × maxi∈V ni do for (i, l) ∈ Vk do if (1 − Di,l )Π(j,q)∈P arenti,l Dj,q = 1 then
min R + αG∗ R,c
s.t. ci ≥ 0, c¯ = G∗ , R ≥ G∗ dhi (G∗ ) ≥ smin , ∀i ∈ V (11) ci + R dv and a solution always exists. Proof: It has been omitted due to space limitations. Program (10) guarantees a sub-optimal solution but (11) does an optimal solution. To guarantee the constraint s∗i (R, c) ≥ smin in (9), it is required to find a lower bound of s∗i (R, c) or the value itself. A new problem is equivalent to (9) only when the value s∗i (R, c) is given. Program (11) is this case. However, in the case of (10), we only have a lower bound. Note that potentially non-convex program (9) is equivalent to a linear program (11). This is because potentially nonconcave constraints can be replaced by linear constraints. The constraint s∗i (R, c) ≥ smin can be changed to a linear one. Once all the control authorities are active, the constraint G(R, c) = G∗ can be replaced by the linear constraint c¯ = G∗ regardless of the reward size R. By solving the convex program described in Theorem 4.3, the system operator can choose minimum or sub-minimum perturbations and reward maximizing the social welfare. V. C ONCLUSIONS We have studied optimal incentive design for distributed stabilizing control of a class of nonlinear dynamic networks. We proposed a two-step solution. As a first step, a new distributed scalar stability index is derived. As a second step, we propose an incentive mechanism which encourages the control authorities to choose small enough stability indices. Moreover, an optimal incentive design is studied where the network-wide stability is ensured and the social welfare is maximized while minimizing perturbations and reward. A PPENDIX I D ISTRIBUTED A LGORITHM FOR CONTROL GAIN k This appendix, assuming that directed k graph Gk is acyclic, addresses the question how to find a set of control gains (5). Algorithm 2 presents a distributed algorithm that finds a set of control gain k described in (5). In Algorithm 2, we assume that each node (i, l) ∈ Vk already knows sets P arenti,l and Childi,l , and stability index γi∗ . Corollary 1.1: Under the assumption that directed k graph Gk is acyclic and each (i, l) ∈ Vk already knows sets P arenti,l and Childi,l , Algorithm 2 determines a set of k which satisfies (5) in a distributed way in time O(|Vk |). Proof: It has been omitted due to space limitations. R EFERENCES [1] T. Basar. Affine incentive schemes for stochastic systems with dynamic information. SIAM Journal on Control and Optimization, 22(2):199– 210, 1984.
ˆ (m +2)L
i +2 ki,l = maxj∈Ni { δi ij,l γ ∗ij,l , m δi,l γi∗ } + ; Di,l = 1; i Send ki,l and Di,l = 1 to (j, q) ∈ Childi,l ; end if end for t = t + 1; end while
[2] C. Bull, A. Schotter, and K. Weigelt. Tournaments and piece rates: An experimental study. The Journal of Political Economy, pages 1–33, 1987. [3] E. Clarke. Multipart pricing of public goods. Public Choice, 11(1):17– 33, 1971. [4] R. Gibbons. Piece-rate incentive schemes. Journal of Labor Economics, pages 413–429, 1987. [5] T. Giebe and P. Schweinzer. Consuming your way to efficiency. SFB/TR, 59:1–8, 2011. [6] T. Groves. Incentives in teams. Econometrica, 41(4):617–631, 1973. [7] Y. Ho, P. Luh, and G. Olsder. A control-theoretic view on incentives. Automatica, 18(2):167–179, 1982. [8] H.K. Khalil. Nonlinear Systems. Upper Saddle River: Prentice hall, 2002. [9] E.P. Lazear and S. Rosen. Rank-order tournaments as optimum labor contracts. Journal of Political Economy, 89(5):841–864, 1981. [10] R. Tamassia M. Goodrich. Data structures and algorithms in Java. John Wiley and Sons, 2008. [11] J. Morgan. Financing public goods by means of lotteries. Review of Economic Studies, 67:761–784, 2000. [12] J. Morgan and M. Sefton. Funding public goods with lotteries: experimental evidence. The Review of Economic Studies, 67(4):785– 810, 2011. [13] J. Nash. Non-cooperative games. Annals of mathematics, pages 286– 295, 1951. [14] N. Nisan and A. Ronen. Algorithmic mechanism design. In 31st Annual ACM Symposium on Theory of computing, pages 129–140, 1999. [15] D.C. Parkes and J. Shneidman. Distributed implementations of Vickrey-Clarke-Groves mechanisms. In International Joint Conference on Autonomous Agents and Multi-agent Systems, pages 261–268, 2004. [16] A. Petcu, B. Faltings, and D.C. Parkes. MDPOP: Faithful distributed implementation of efficient social choice problems. In International Joint Conference on Autonomous Agents and Multi-agent Systems, pages 1397–1404, 2006. [17] K. Wayne R. Sedgewick. Algorithms. Pearson Education, 2011. [18] N Sandell Jr, P Varaiya, M Athans, and M Safonov. Survey of decentralized control methods for large scale systems. Automatic Control, IEEE Transactions on, 23(2):108–128, 1978. [19] D. Siljak. Decentralized Control of Complex Systems. Academic Press, 1991. [20] Eduardo D Sontag. Smooth stabilization implies coprime factorization. Automatic Control, IEEE Transactions on, 34(4):435–443, 1989. [21] T. Tanaka, F. Farokhi, and C. Langbort. A faithful distributed implementation of dual decomposition and average consensus algorithms. In IEEE Conference on Decision and Control, pages 2985–2990, 2013. [22] W. Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of Finance, 16(1):8–37, 1961. [23] M. Zhu, N. Li, W. Shi, and R. Gadh. Distributed access control of volatile renewable energy resources. In PES General Meeting— Conference & Exposition, 2014 IEEE, pages 1–5. IEEE, 2014.
2294