SIAM J. MATRIX ANAL. APPL. Vol. 34, No. 3, pp. 999–1013
c 2013 Society for Industrial and Applied Mathematics
OPTIMIZING THE SPECTRAL RADIUS∗ Y. NESTEROV† AND V. Y. PROTASOV‡ Abstract. We suggest a new approach to finding the maximal and the minimal spectral radii of linear operators from a given compact family of operators, which share a common invariant cone (e.g., family of nonnegative matrices). In the case of families with the so-called product structure, this leads to efficient algorithms for optimizing the spectral radius and for finding the joint and lower spectral radii of the family. Applications to the theory of difference equations and to problems of optimizing the spectral radius of graphs are considered. Key words. nonnegative matrix, spectral radius, optimization methods, difference equation, spectrum of a graph AMS subject classifications. 15B48, 90C26, 15A42 DOI. 10.1137/110850967
1. Introduction. In this paper, we study the problem of finding the minimal and the maximal spectral radii of finite-dimensional linear operators from some compact family M. The main assumption is that all these operators share a common invariant cone K, which is supposed to be convex, closed, solid (possessing a nonempty interior), and pointed (not containing a straight line). Such operators possess most properties of nonnegative matrices and have found many applications in functional and convex analysis, large networks, graphs, etc. (see [9, 19, 17] and references therein). In section 2 we present a general approach to finding the maximal/minimal spectral radii of a family of operators with special structure. Our main attention is on the so-called product families, for which the problem of optimizing the spectral radius can be efficiently solved (sections 3 and 4). A special case of such families was considered in the recent paper [1]. It was shown that, for families of nonnegative matrices with independent column uncertainty, the maximal spectral radius is actually equal to the joint spectral radius. This makes the problem of computing the joint spectral radius, which is very hard for general matrices, solvable in this case within finite time, by examination of all matrices of the family. However, the total number of matrices in a family with independent column uncertainty is exponential in the dimension. Already in the simplest case, when each uncertainty set consists of two elements, the family contains 2n matrices, where n is the dimension. Therefore, the full search may require enormous computations for high dimensions n. The question arises whether there is an efficient algorithm for finding the maximal spectral radius of such families. In this paper, we develop two efficient methods for maximizing the spectral radius of product families: the spectral simplex method and the two-level optimization method. Moreover, a new construction in the proof makes it possible to extend our approach to ∗ Received by the editors October 10, 2011; accepted for publication (in revised form) by M. P. Friedlander April 23, 2013; published electronically July 11, 2013. http://www.siam.org/journals/simax/34-3/85096.html † Center for Operations Research and Econometrics (CORE), Universit´ e Catholique de Louvain (UCL), B-1348 Louvain-la-Neuve, Belgium (
[email protected]). This author was supported by the grant “Action de Recherche Concert` e ARC 04/09-315” from the “Direction de la Recherche Scientifique - Communaut`e Fran¸caise de Belgique” and by Laboratory of Structural Methods of Data Analysis in Predictive Modeling, MIPT, RF government grant ag.11.G34.31.0073. ‡ Department of Mechanics and Mathematics, Moscow State University, 119992, Moscow, Russia (
[email protected]). This author was supported by the RFBR grants 13-01-00642 and 11-0100329, by the grant NSh-6003.2012.1, and by a grant of the Dynasty foundation.
999
1000
Y. NESTEROV AND V. Y. PROTASOV
a wide class of families with convex invariant cones (product families) and to prove similar results not only for the maximal but also for the minimal spectral radius. The two new methods for maximizing/minimizing the spectral radius are presented and analyzed in section 5. Finally, in section 6, we give two examples of applications. The first application is the optimization of the rate of growth of solution for a nonstationary difference equation. The second one is the optimization of the spectrum of a graph. Some other applications are also mentioned. Let us start with some notation. Denote by E a finite-dimensional linear vector space, by n its dimension, and by E ∗ its dual space composed of all linear functions s on E. The value of s at x is denoted by s, x. This scalar product defines the adjoint operator A∗ of a linear operator A : E → E by identity s, Ax
= A∗ s, x,
x ∈ E, s ∈ E ∗ .
Consider a linear operator A : E → E with an invariant, closed, convex, solid, pointed cone K ⊂ E: AK ⊆ K.
(1.1)
By the Kre˘ın–Rutman theorem [9] there is an eigenvector x(A) of the operator A such that (1.2)
Ax(A) = ρ(A) x(A)
and x(A) ∈ K,
where ρ(A) is the spectral radius of A (the largest modulus of its eigenvalues). This vector will be referred to as a dominant eigenvector of A, or a Perron–Frobenius eigenvector. The Kre˘ın–Rutman theorem extends the well-known Perron–Frobenius theorem from families of nonnegative matrices to families of operators sharing a common invariant cone. Usually we normalize x(A) so that e∗ , x(A) = 1, where e∗ is an arbitrary fixed normalizing vector from the interior of K ∗ , the cone dual to K: K∗
= {s ∈ E ∗ : s, x ≥ 0, x ∈ K} ⊂ E ∗ .
Under our assumptions on K, the dual cone K ∗ is also closed, convex, solid, and pointed. Note that the vector x(A) is, in general, not unique (in this case we attribute this notation to any of them). Thus, the spectral radius satisfies ρ(A) = e∗ , Ax(A). If x(A) is unique and belongs to int K, then the operator A is called irreducible. Condition (1.1) is satisfied if and only if A∗ K ∗ ⊆ K ∗ . Hence, using an arbitrary vector e ∈ int K, we can define in a similar way the dominant eigenvector s(A∗ ) ∈ K ∗ of the adjoint operator: A∗ s(A∗ ) = ρ(A∗ ) s(A∗ ) , s(A∗ ), e = 1. Let us recall that ρ(A∗ ) = ρ(A). In what follows, we often use the following simple observations. Lemma 1. Let A satisfy condition (1.1) and λ ≥ 0. 1. If x ∈ int K and λx − Ax ∈ K, then λ ≥ ρ(A). 2. If x ∈ int K and λx − Ax ∈ int K, then λ > ρ(A). 3. If x ∈ K \ {0} and Ax − λx ∈ K, then λ ≤ ρ(A). 4. If x ∈ K and Ax − λx ∈ int K, then λ < ρ(A). Proof. For proving items 1 and 2, note that 0
≤
s(A∗ ), λx − Ax = λs(A∗ ), x − A∗ s(A∗ ), x
=
λs(A∗ ), x − ρ(A∗ )s(A∗ ), x = (λ − ρ(A)) s(A∗ ), x.
OPTIMIZING THE SPECTRAL RADIUS
1001
Since x ∈ int K, we have s(A∗ ), x > 0. Moreover, under the assumptions of item 2, the first inequality in the latter chain is strict. def In order to justify item 3, consider the sequence xk = Ak x0 with x0 ∈ K \ {0}. Let us fix some s ∈ K ∗ with s, x0 = 1. By the assumption of item 3, we have x1 − λx0 ∈ K. Therefore, by (1.1), we have xk+1 − λxk ∈ K, k ≥ 0. Thus, s, Ak x0 = s, xk ≥ λk . However, ρ(A) is an upper bound for the asymptotic growth of the norm of operator Ak . Hence, we obtain the desired inequality. Clearly, under the conditions of item 4, a small increase of λ does not violate conditions of item 3. 2. Optimization problems with spectral radius. Let M be a compact set of linear operators A : E → E. The main goal of this paper is to develop efficient algorithmic tools for approximating the following values: (2.1)
ρ∗ (M) = max ρ(A),
(2.2)
ρ∗ (M) = min ρ(A).
A∈M A∈M
Of course, very often these problems are computationally intractable. Nevertheless, let us write for them some necessary and sufficient optimality conditions based on the statements of Lemma 1. We consider the case when our family satisfies the following assumption. Assumption 1. All operators of the family M share the same invariant cone K. Some conditions for a family of operators to have a common invariant cone can be found in [19, 17]. Let us look first at the maximization problem (2.1). Denote by co (X) and cl (X) the convex hull and the closure, respectively, of a set X. Let ¯ L0 (A)
def
=
¯ ∈ int K}, {A : (A¯ − A) x(A)
¯ = cl (L0 (A)), L(A)
¯ U0 (A)
def
¯ x(A) ¯ ∈ int K}, {A : (A − A)
¯ def U (A) = cl (U0 (A)).
=
def
Assume that for some operator A¯ ∈ M we have ¯ ∈ int K, x(A)
(2.3)
¯ ⊇ M. L(A)
¯ x(A) ¯ − A x(A) ¯ ∈ K for any A ∈ M. Hence, by item 1 of This means that ρ(A) ∗ ¯ Lemma 1, ρ(A) = ρ (M). Thus, condition (2.3) is a sufficient characterization of the optimal solution of problem (2.1). Assume now that for two operators A and A¯ from M we have ¯ x(A) ¯ ∈ int K. (A − A) In view of item 4 of Lemma 1 this means that A¯ cannot be an optimal solution of the problem (2.1). Hence, the following condition is a necessary characterization of the optimal solution of problem (2.1): ¯ U0 (A) co (M) = ∅. (2.4) The latter condition naturally leads to a simple local search procedure, which we call the spectral analytic center method. Let us fix for the cone K a convex barrier function F (x), and let A0 ∈ M. Then we iterate the following operation: (2.5)
Ak+1
=
arg min F ((A − Ak ) x(Ak )).
A ∈ co (M)
1002
Y. NESTEROV AND V. Y. PROTASOV
If U0 (Ak ) co (M) = ∅, then in view of item 4 of Lemma 1 we have ρ(Ak+1 ) > ρ(Ak ). In general we cannot say more. However, in section 5.1 we will show that for some special families of operators a similar scheme finds the exact solution of problem (2.1) in a finite number of steps. For minimization problem (2.2), the necessary and sufficient optimality conditions can be derived in a similar way. If for some Aˆ ∈ M we have ˆ U (A)
(2.6)
⊇ co(M),
ˆ = ρ∗ (M) (see item 3 of Lemma 1). If Aˆ is an optimal solution to (2.2) and then ρ(A) ˆ x(A) ∈ int K, then ˆ (2.7) co (M) = ∅ L0 (A) (see item 2 of Lemma 1). The spectral analytic center method for minimization problem (2.2) is as follows: (2.8)
Ak+1
=
arg min F ((Ak − A) x(Ak )).
A∈co (M)
However, it is justified only if we can ensure that x(Ak ) ∈ int K for all k ≥ 0. Since problems (2.1), (2.2) are often difficult, let us define for them some convex relaxations. Denote λ∗ (M) = (2.9) λ∗ (M) =
inf {λ : λx − Ax ∈ K ∀A ∈ M},
λ∈R, x∈int K
sup {λ : Ax − λx ∈ K ∀A ∈ M}.
λ∈R, x∈K
Clearly, (2.10)
λ∗ (M) ≤ ρ∗ (M) ≤ ρ∗ (M) ≤ λ∗ (M) .
At the same time, characteristics λ∗ (M) and λ∗ (M) are computable. Choose some e ∈ int K, e∗ ∈ int K ∗ , and denote Δ = Δe∗ = {x ∈ K| e∗ , x = 1} and Δ∗ = Δ∗e = {s ∈ K ∗ | s, e = 1}. Since the both cones K and K ∗ are solid and pointed, both sets Δ and Δ∗ are compact. Consider the function ψ ∗ (x, λ) = max max∗ s, Ax − λx. A∈M s∈Δ
For a fixed λ, function ψ ∗ (x, λ) is convex in x as a pointwise maximum of convex ∗ support functions ψA (x) = maxs∈Δ∗ s, Ax − λx over all A ∈ M. ∗ If ψ (x, λ) ≤ 0 for some x ∈ int K, then λx − Ax ∈ K
∀A ∈ M,
and therefore λ ≥ λ∗ (M). Note that we can justify validity of this upper bound by checking the nonpositivity of the following decreasing function: (2.11)
ξ ∗ (λ) =
inf
v∈int Δ
ψ ∗ (v, λ).
Since ξ ∗ (λ) is continuous, the value λ∗ (M) coincides with its root, i.e., with solution of the equation ξ ∗ (λ) = 0. Later, in section 5.2, we will discuss an efficient computational
OPTIMIZING THE SPECTRAL RADIUS
1003
strategy for approximating this root. Here we just mention that the value ξ ∗ (λ) is an optimal value of the convex minimization problem (2.11). As far as the function ψ ∗ (x, λ) is computable, problem (2.11) can be solved up to reasonable accuracy in polynomial time. We will consider important application examples in section 6. Similarly, the bound λ∗ (M) can be characterized by the following functions: ψ∗ (x, λ)
=
ξ∗ (λ)
=
max max s, λx − Ax,
A∈M s∈Δ∗
(2.12) inf ψ∗ (x, λ).
x∈Δ
If ξ∗ (λ) ≤ 0, then λ ≤ λ∗ (M). Thus, λ∗ (M) is a root of the increasing function ξ∗ (λ). Now we extend our considerations onto the joint spectral radius (JSR) and the lower spectral radius (LSR) of families of operators. We are going to establish the relation of these joint spectral characteristics with the maximal and minimal spectral radii and with the values λ∗ and λ∗ . The JSR σ ∗ (M) is defined as the following limit: σ ∗ (M) = limk→∞ maxB∈Mk B1/k , where Mk is the set of all products of length k (with repetition permitted) of operators from M. This limit exists for every compact family of operators M and does not depend on the operator norm. The JSR is the exact upper bound of the asymptotic growth of the discrete-time switching linear system (2.13)
x0 ∈ E,
xk+1 = Ak xk , Ak ∈ M, k ≥ 0.
The JSR has numerous applications and a rich history; see [1, 18] and references therein. For every family M we obviously have σ ∗ (M) ≥ ρ∗ (M), which gives a lower bound for the JSR. In order to obtain an upper bound, without loss of generality we can consider x0 ∈ K (otherwise, we represent x0 as a difference of two vectors from K). In this case xk ∈ K for all k. Let us choose some vector s ∈ int K ∗ and a number λ > 0 such that λs − A∗ s ∈ K ∗
∀A∗ ∈ M∗ .
Then xk+1 , s = Ak xk , s = xk , A∗k s ≤ λxk , s. Iterating m times we get xm , s ≤ λm x0 , s for every m. Hence, σ ∗ (M) ≤ λ, and therefore σ ∗ (M) ≤ λ∗ (M∗ ). Since λ∗ (M∗ ) = λ∗ (M), we obtain the following sandwich inequality: (2.14)
ρ∗ (M) ≤ σ ∗ (M) ≤ λ∗ (M).
In general, among these three values, only λ∗ (M) is computationally tractable. The two other values usually need an extraordinarily high volume of computations (see the bibliography in [1]). Nevertheless, in the following sections we point out some nontrivial families of operators for which the left-hand side of inequality (2.14) coincides with its right-hand side. In this case all three values are equal and, hence, com¯ ≥ λ∗ (M), putable. Observe that the sufficient condition (2.3) guarantees that ρ(A) and therefore all the inequalities in (2.14) become equalities. Consider now the LSR σ∗ (M), which is an exact lower bound for the asymptotic maximal growth of the discrete-time switching linear system (2.13). It is defined in
1004
Y. NESTEROV AND V. Y. PROTASOV
a similar way to the JSR, but using the minimum instead of maximum: σ∗ (M) = limk→∞ minB∈Mk B1/k . Let us remark that the LSR is usually even harder to compute than the JSR [18]. Clearly, ρ∗ (M) ≥ σ∗ (M). For obtaining a lower bound for σ∗ (M), we can assume again that x0 ∈ K, in which case xk ∈ K for all k. Let us choose some vector s ∈ int K ∗ and λ > 0 such that A∗ x − λs ∈ K ∗
∀A∗ ∈ M∗ .
Then xk+1 , s = Ak xk , s = xk , A∗k s ≥ λxk , s. Hence, σ∗ (M) ≥ λ, and we obtain another sandwich inequality: (2.15)
ρ∗ (M) ≥ σ∗ (M) ≥ λ∗ (M).
ˆ ≤ λ∗ (M), and therefore all Again, the sufficient condition (2.6) guarantees that ρ(A) inequalities in (2.15) become equalities. In the subsequent sections we will show that for some special families composed of operators one can efficiently solve optimization problems (2.1) and (2.2) even if the uncertainty set M has a combinatorial structure. 3. Product families. We have derived bounds (2.14) and (2.15) for the JSR and LSR of a family M and for its maximal and minimal spectral radii. They are valid for every family of operators sharing a common invariant cone. In this section we show that for some important cases these bounds are satisfied as equalities. Hence, the values σ ∗ , ρ∗ and σ∗ , ρ∗ are equal to the efficiently computable values λ∗ and λ∗ , respectively. Let us consider families of operators with a special structure. First, we introduce an intermediate N -dimensional linear space E1 . This space is assumed to be equipped with partial ordering defined by a given pointed convex cone K1 ⊂ E1 . We fix a basic operator B : E1 → E, such that BK1 ⊆ K. Consider now a family F composed of some operators F : E → E1 . Then we can define our main family of operators as M = {A = BF, F ∈ F }. Assuming F K ⊆ K1 for any F ∈ F , we guarantee that all operators from the family M share the same invariant cone K since BF (K) ⊂ B(K1 ) ⊂ K. The following lemma plays a crucial role in our further study. It establishes sufficient conditions for the sandwich inequalities (2.14) and (2.15) to be satisfied as equalities. ¯ ∈ int K and Lemma 2. Let A¯ = B F¯ ∈ M. If x(A) (3.1)
¯ ∈ K1 (F¯ − F ) x(A)
∀ F ∈ F,
¯ ∈ K and ¯ = ρ∗ (M) = σ ∗ (M) = λ∗ (M). If x(A) then ρ(A) (F − F¯ ) x¯(A) ∈ K1
∀ F ∈ F,
¯ = ρ∗ (M) = σ∗ (M) = λ∗ (M). then ρ(A) Proof. Indeed, in view of conditions (3.1), for any A ∈ M we have (3.1)
¯ x(A) ¯ − Ax(A) ¯ = (A¯ − A) x(A) ¯ = B (F¯ − F ) x(A) ¯ ∈ K. ρ(A)
1005
OPTIMIZING THE SPECTRAL RADIUS
¯ we have λx(A) ¯ − Ax(A) ¯ ∈ K, and therefore ρ(A) ¯ Thus, for λ = ρ(A) remains to use (2.14). For the second statement, note that
(2.9)
≥ λ∗ (M). It
(3.1)
¯ − ρ(A) ¯ x(A) ¯ = (A − A) ¯ x(A) ¯ = B (F − F¯ ) x(A) ¯ ∈ K. Ax(A) ¯ ≥ ρ∗ (M). It remains to use (2.15). Hence, λ∗ (M) ≥ ρ(A) Let us note that the assumptions of the two parts of Lemma 2 are slightly different. In the first part, we need to impose that x(A) ∈ int K, and this condition, in general, cannot be omitted. In the second part this assumption is not necessary. Lemma 2 highlights the conditions under which the computation of the maximal spectral radius ρ∗ (M) and of the JSR σ ∗ (M) is reduced to finding the value λ∗ (M) of optimization problem (2.9). Moreover, in this case the maximal spectral radius ¯ The next lemma sharpens this result. It shows that for positive is attained at A. operators, condition (3.1) leads to a strict inequality between spectral radii. Recall that an operator A is called positive on K if Ax ∈ int K for all x ∈ K \ {0}. Lemma 3. Let F, F˜ be arbitrary operators from F , and let A = BF, A˜ = B F˜ . If the operator A˜ is positive on K and def
d = (F˜ − F ) x(A) ∈ K1 ,
d = 0 ,
˜ > ρ(A). then ρ(A) ˜ If A is positive on K and (F − F˜ ) x(A) ∈ K1 \ {0} , ˜ < ρ(A). then ρ(A) def Proof. Denote x ˆ = A˜ x(A) = B (F + (F˜ − F )) x(A) = ρ(A)x(A) + Bd. Then
˜x = ρ(A)Ax(A) ˜ ˜ ˜ Aˆ + ABd = ρ(A)ˆ x + ABd. ˜ Since A˜ is positive on K, we have ABd ∈ int K. Hence, by item 4 of Lemma 1, we ˜ obtain ρ(A) > ρ(A). For proving the second part of the lemma, we use item 2 of Lemma 1. We are ready now to formulate the main results that allow the application of Lemmas 2 and 3 for computing the maximal/minimal spectral radii. To this end, we need to impose two extra assumptions on the family M. Assumption 2. (3.2)
F x ∈ int K1
∀x ∈ K \ {0},
B(int K1 ) ⊆ int K,
and F ∈ F .
In the remainder of this section, we assume that Assumption 2 is always true. Thus, all operators A ∈ M are positive on K: Ax ∈ int K for all x ∈ K \ {0}. This assumption is technical and will be omitted in the next section. The second assumption is more essential. It concerns the structure of the auxiliary cone K1 . N Assumption 3. The cone K1 is a positive orthant in RN : K1 ≡ R+ . Under this assumption, the operators B and F can be naturally associated with their matrices written by columns as follows: B
=
(B1 , . . . , BN ),
Bj
∈
K,
(3.3)
Fj ∈ K ∗ ,
F T = (F1 , . . . , FN ), j = 1, . . . , N.
1006
Y. NESTEROV AND V. Y. PROTASOV
Then the matrix of operator A is represented as A = BF . From now on, we use the same notation for operators and for corresponding matrices. In this situation, Lemma 3 immediately implies the following theorem. Theorem 1 (on rank-one correction). Let x = x(A) ∈ K be a dominant eigenvector of a positive operator A ∈ M. Assume that for some index j and for some f˜ ∈ int K ∗ we have f˜, x > Fj , x. Then for the operator A˜ = A + Bj (f˜ − Fj )T ˜ > ρ(A). If f˜, x < Fj , x, then ρ(A) ˜ < ρ(A). we have ρ(A) ˜ Proof. Writing F for the matrix F with the jth row replaced by f˜T , we get N ˜ (F − F )x(A) ∈ R+ . It remains to use Lemma 3. The proof of the second statement is similar. Thus, if a vector f˜ has a bigger projection onto the dominant eigenvector x(A) than the vector Fj , then the addition of the rank-one matrix Bj (f˜ − Fj )T increases the spectral radius of A. This explains the main idea behind the method we are going to present: maximization (or minimization) of the spectral radius of family M by successive rank-one corrections. Of course, this approach does not work for arbitrary matrix families; some structural assumptions on M are required. Let us now define the corresponding families. Let Fj ⊂ K ∗ , j = 1, . . . , N , be given compact sets, and let B be as in (3.3). The family (3.4)
M = M[F1 , . . . , FN ] = {A = BF, Fj ∈ Fj , j = 1, . . . , N }
is called a product family of operators corresponding to the independent column uncertainty sets F1 , . . . , FN . A product family is positive on K if Fj ⊂ int K ∗ for all j = 1, . . . , N . Example 1. The simplest example of a product family was considered in [1]. This is a set of nonnegative n × n-matrices with independent column uncertainty sets Fj , n , N = n, B is an identity operator, and {Fj } are j = 1, . . . , n. In this case K = R+ n arbitrary compact subsets of R+ . The family M consists of all matrices such that for every j = 1, . . . , n the jth column belongs to Fj . This product family is positive on K if all column uncertainty sets consist of strictly positive vectors. In this case, all matrices in M are strictly positive. Theorem 2. For every positive product family we have (3.5)
λ∗ (M) = σ ∗ (M) = ρ∗ (M),
λ∗ (M) = σ∗ (M) = ρ∗ (M).
Proof. Denote by A¯ ≡ B F¯ the solution of optimization problem (2.1) and by ¯ its dominant eigenvector. In view of Theorem 1, we have x ¯ = x(A) ¯ ≤ Fj , x
(3.6)
F¯j , x ¯ ∀Fj ∈ Fj , j = 1, . . . , N.
Therefore, (3.3)
¯ x ρ(A) ¯ − BF x ¯ = B(F¯ − F )¯ x ∈ K (2.9)
∀F ∈ F .
¯ and all inequalities in (2.14) are satisfied as equalities. The Hence, λ∗ (M) ≤ ρ(A), second chain of equalities can be proved in the same way.
1007
OPTIMIZING THE SPECTRAL RADIUS
Corollary 1. For product families, the values ρ∗ (M) and ρ∗ (M) do not change if we replace the set F by its convex hull. In any case, the optimal solutions of these problems are extreme points of the feasible sets. Proof. Indeed, in the proof of Theorem 2 we need conditions (3.6) to be satisfied only by extreme points F¯j of the feasible sets Fj . 4. Extension to nonnegative product families. We are now going to extend the results of the previous section to the general product families of type (3.4), without the strict positivity assumption Fj ⊂ int K ∗ . This will allow us to work with arbitrary nonnegative product families—in particular with product families of sparse matrices. Let us take an arbitrary operator H, which is positive on K. Consider a positive family M = {A + H, A ∈ M}. In particular, for product families we take an arbitrary h ∈ int K ∗ and define the perturbed uncertainty set Fj = Fj + h, > 0. The corresponding family M is positive. Now we are going to analyze the continuity of our characteristics as → 0. Lemma 4. For any compact family M with an invariant cone, function σ ∗ (M ) is continuous in , and function σ∗ (M ) is lower semicontinuous in . Proof. For any k denote Hk = supB∈(M )k B1/k , Rk = supB∈(M )k ρ1/k (B), and hk = inf B∈(M )k B1/k . All these functions are continuous in for each k. Since σ ∗ (M ) = inf k∈N Hk and σ∗ (M ) = inf k∈N hk , both are lower semicontinuous. Moreover, since σ ∗ (M ) = supk∈N Rk , this function is upper semicontinuous and therefore continuous. Corollary 2. For an arbitrary family M with an invariant cone we have σ ∗ (M ) → σ ∗ (M)
and
σ∗ (M ) → σ∗ (M)
as
→ 0.
Proof. For the JSR, the statement follows directly from Lemma 4. For the LSR, note that each operator A ∈ M is nondecreasing in . Therefore, the function σ∗ (M ) is nondecreasing in ε as well. Combining this observation with its lower semicontinuity (Lemma 4), we see that this function is continuous as → 0. Theorem 3. For any family M with invariant cone we have λ∗ (M ) → λ∗ (M) ∗
and
λ∗ (M ) → λ∗ (M)
as
→ 0.
Proof. Since the value λ (M ) is nondecreasing in , its continuity at zero is a direct consequence of the open domain in the definition (2.9) of λ∗ (M). Let us establish the second statement. Note that the function λ∗ (M ) is nondecreasing in > 0. Assume it converges to q > λ∗ (M) as → 0. Let us take an arbitrary λ on the interval (λ∗ (M), q). For any > 0 we can find x such that x = 1, and A x − λx ∈ K for all A ∈ M . By compactness of M, there exists a sequence k → 0 such that the corresponding vectors xk = xk converge to some x ∈ K, x = 1. Note that for any Ak ∈ Mk , we have Ak xk − λxk ∈ K. Taking a limit as k → ∞, we get Ax − λx ∈ K for all A ∈ M (we use the fact that xk does not converge to zero). Thus, λ∗ (M) ≥ λ, which contradicts our assumption. Combining this with Theorem 2, we obtain the following result. Theorem 4. Any product family (3.4) possesses property (3.5). 5. Computing the optimal spectral radius for product families. We see that for every product family M, the JSR σ ∗ (M) is equal to the maximal spectral radius ρ∗ (M), and the LSR is equal to the minimal spectral radius. Thus, the problems of JSR/LSR computation, being extremely hard in general, for product families are reduced to finding the optimal spectral radius. Theorems 1 and 4 suggest at least two efficient strategies for optimizing the spectral radius.
1008
Y. NESTEROV AND V. Y. PROTASOV
5.1. Spectral simplex method. Let us assume either that all sets Fj are finite or they are polyhedral sets defined by systems of linear inequalities. Assume first that the product family M is positive. Consider the following strategy for maximizing the spectral radius.1 Input: Choose arbitrary F 0 ∈ F . Define A0 = BF 0 . Iteration k ≥ 0: 1. Compute the dominant eigenvector xk of Ak = BF k . 2. Find jk : Fjkk , xk < f˜jkk , xk , where
(5.1)
def f˜jk = arg maxFj , xk , j = 1, . . . , N. Fj ∈Fj
3. If jk is not found, then STOP. Otherwise, F k+1 = F k + ejk (f˜jkk − Fjkk )T . In this scheme, ej is the jth coordinate vector in RN . By Theorem 1, we have ρ(Ak+1 ) > ρ(Ak ). In the case of polyhedral set F , we can choose F 0 to be its extreme point. Then all other vectors F k will also be the vertices of F . Since the number of vertices is finite, our algorithm terminates in a finite time. The convergence and the rate of convergence of the spectral simplex method for general F are not yet known. If M is not necessarily positive, then we can take a perturbed family M for arbitrarily small > 0 and run for it the spectral simplex method. The accuracy of such an approximation can be derived from the standard results on the continuity of eigenvalues. 5.2. Two-level optimization. As we have seen in section 2, the value λ∗ (M) can be characterized as a unique root of a decreasing function ξ ∗ (λ). The value of the latter function is an optimal value of the convex optimization problem (2.11). Hence, it can be computed with any necessary accuracy. However, we are going to use this information in a localization procedure. Therefore, in order to estimate the complexity of the whole process, we need some bounds on the growth of function ξ ∗ (λ) in the neighborhood of its root. Let A¯ be the optimal solution of the maximization problem (2.1). Denote by x ¯ ∈ Δe∗ ⊂ K its dominant eigenvector and by s¯ ∈ Δe ⊂ K ∗ the dominant eigenvector of operator A¯∗ . Thus, ¯x = λ∗ x A¯ ¯,
A¯∗ s¯ = λ∗ s¯,
def
where λ∗ = ρ∗ (M). Denote
1 For
γL
=
γR
=
max ¯ s, x,
x∈Δe∗
max s, x ¯,
s∈Δe
μL = min ¯ s, x, x∈Δe∗
μR = min s, x ¯. s∈Δe
minimizing the spectral radius, the scheme must be adapted in a straightforward manner.
OPTIMIZING THE SPECTRAL RADIUS
1009
Let us assume that μL > 0 and μR > 0 (this is guaranteed, for example, by Assumption 2 on the positivity of all operators A ∈ M). Theorem 5. For any λ ∈ R we have (5.2) min{γL (λ∗ − λ), μL (λ∗ − λ)} ≤ ξ ∗ (λ) ≤ max{γR (λ∗ − λ), μR (λ∗ − λ)}. Proof. Indeed, ξ ∗ (λ)
= =
¯ − λx inf max max s, Ax − λx ≥ min ¯ s, Ax
x∈Δe∗ A∈M s∈Δe
x∈Δe∗
min (λ∗ − λ)¯ s, x = min{γL (λ∗ − λ), μL (λ∗ − λ)}.
x∈Δe∗
Further, since λ∗ is a root of function ξ ∗ (λ), we have x − λ∗ x¯. 0 = ψ(¯ v , λ∗ ) = max max s, A¯ A∈M s∈Δe
¯ − A¯ x ≥ 0. Therefore, This means that for all s ∈ K ∗ and A ∈ M we have s, λ∗ x ¯ − A¯ x∈K λ∗ x
∀A ∈ M.
Thus, we conclude that ξ ∗ (λ)
= ≤ ≤
inf max max s, Ax − λx
x∈Δe∗ A∈M s∈Δe
x − λ∗ x ¯) + (λ∗ − λ)¯ x max max s, (A¯
A∈M s∈Δe
x = max{γR (λ∗ − λ), μR (λ∗ − λ)}. max s, (λ∗ − λ)¯
s∈Δe
Now we can use the above bounds for justifying the complexity estimates of the √ def 3− 5 def following search procedure based on the golden section. Denote τ1 = 2 < τ2 = √ 5−1 2 . Input: Choose λ0 , ξ ∗ (λ0 ) > 0, and δ0 > 0 such that ξ ∗ (λ0 + δ0 ) < 0. Iteration k ≥ 0. 1. Define two trial values λk = λk + τ1 δk and λk = λk + τ2 δk . (5.3) 2. Start in parallel two processes for minimizing ψ ∗ (·, λk ) and ψ ∗ (·, λk ). 3. Stop both processes after justification of one of the following statements: Case A: ξ ∗ (λk ) > 0. Case B: ξ ∗ (λk ) < 0. 4. Set δk+1 = τ2 δk . In Case A set λk+1 = λk . Otherwise λk+1 = λk . The idea of this scheme is very simple. Since λ∗ ∈ [λk , λk + δk ], we always have max{|λk − λ∗ |, |λk − λ∗ |} ≥ max{τ1 , τ2 − τ1 , 1 − τ2 } δk = τ1 δk .
1010
Y. NESTEROV AND V. Y. PROTASOV
Therefore, in view of Theorem 5, the necessary accuracy for minimizing functions ψ ∗ (·, λk ) and ψ ∗ (·, λk ) does not exceed O(δk ). If we solve these problems by a polynomial-time method, which needs O(p(n) ln 1 ) iterations for generating an solution to a convex problem, then at the kth iteration of the scheme (5.3) we run this method at most for Nk = O p(n) ln δ1k = O(p(n) k) steps. Hence, k steps of the scheme (5.3) need at most O(p(n) k 2 ) iterations of the lower-level method. Since the total number of iterations of the upper-level process is bounded by O(ln 1 ), where is the target accuracy of approximation for λ∗ , we conclude that our scheme (5.3) has polynomial-time complexity. We can apply a similar technique for approximating λ∗ (M). Since all reasoning is very similar to that of the maximization case, we omit the proofs. Let Aˆ be the optimal solution of the minimization problem (2.2). Denote by x ˆ ∈ Δe∗ ⊂ K its dominant eigenvector and by sˆ ∈ Δe ⊂ K ∗ the dominant eigenvector of the operator Aˆ∗ . Denote λ∗ = ρ∗ (M), and γˆL
=
γˆR
=
max ˆ s, x,
x∈Δe∗
max s, x ˆ,
s∈Δe
μ ˆL = min ˆ s, x, x∈Δe∗
μ ˆR = min s, x ˆ. s∈Δe
Let us assume that μ ˆL > 0 and μ ˆR > 0. Theorem 6. For any λ ∈ R we have (5.4)
min{ˆ γL (λ − λ∗ ), μ ˆL (λ − λ∗ )} ≤ ξ∗ (λ) ≤ max{ˆ γR (λ − λ∗ ), μ ˆR (λ − λ∗ )}.
Thus, in order to compute an approximation to λ∗ (M), we can apply a version of method (5.3) subject to appropriate modifications. 6. Application examples. In this section, we consider two important applications related to the optimization of the spectral radius. The first deals with nonstationary difference equations, and the second consists in optimizing the spectral radius of the adjacency matrix of graphs. Of course, this list is far from complete. We believe our results may be applicable to the stability analysis of linear desynchronized systems [8, 20], to the study of the Leontief input-output model [10], to the optimal investment problems in Cantor–Lippman-type models [3], and to some models of mathematical ecology [12] among other research areas. 6.1. Nonstationary difference equations. For the sake of simplicity, we focus on univariate difference equations, keeping in mind that the results below can be easily extended to the multivariate case. Consider a finite difference equation in the form (6.1)
xk =
m
qj xk−j ,
k ≥ m +1,
j=1
where q¯ = {qj }m 1 is a given vector of coefficients. It is well known that the general solution {xk } for this equation can be written explicitly in terms of the roots of the characteristic polynomial. All solutions form an m-dimensional linear space, and any particular solution is uniquely defined by the initial conditions, i.e., by the values x1 , . . . , xm .
OPTIMIZING THE SPECTRAL RADIUS
1011
The largest and the smallest possible rates of asymptotic growth of the sequence {xk } are |λ1 |k k μ and |λm |k , respectively, where λ1 and λm are the largest and smallest (by modulo) roots of the characteristic polynomial and μ is the multiplicity of the root λ1 (see, e.g., [13, 15]). Now assume that the sequence of coefficients q¯ is not fixed, and at any step k of the recurrence we can chose independently any q¯ from a given uncertainty set Q ⊂ Rm . The question is as follows: What are the largest and the smallest possible rates of asymptotic growth of the sequence {xk }? By the rate of growth we understand the value lim supk→∞ k1 ln |xk |. What strategy of choosing the vector coefficients q¯k ∈ Q provides the largest and, respectively, the smallest growth? These questions, which are very difficult in general, have a simple solution in the case when all vectors q¯ = (q1 , . . . , qm ) ∈ Q are nonnegative. Let Q be a given compact subset of Rm + , and let the initial values x1 , . . . , xm be positive. Actually, we assume positivity of the starting values only for the sake of simplicity. Similar analyses can be done for arbitrary initial data. In order to formulate the statement, we need some notation. We say that sequence {¯ qi } is stationary if q¯i are the same for all i. Furthermore, for any q¯ ∈ Q we write Aq¯ for the m × m-matrix, whose first row is q¯ and whose other rows are, respectively, e1 , . . . , em−1 (the coordinate row vectors): ⎛ ⎞ q1 q2 q3 . . . qm−1 qm ⎜ 1 0 0 ... 0 0 ⎟ ⎜ ⎟ ⎜ 0 1 0 ... 0 0 ⎟ Aq¯ = ⎜ ⎟. ⎜ .. .. .. .. .. ⎟ ⎝ . . . . . ⎠ 0
0
0
...
1
0
Denote M = {Aq¯ : q¯ ∈ Q}. Proposition 1. Under the assumptions above, the largest and smallest rates of growth of the sequence {xk } are both achieved at stationary sequences. For the largest rate of growth we have xk ∼ λk1 , and for the smallest growth we have xk ∼ λk0 , where λ1 = ρ(Aq¯1 ) = maxq¯∈Q ρ(Aq¯) and λ0 = ρ(Aq¯0 ) = minq¯∈Q ρ(Aq¯). Proof. Denote yk = (xk , . . . , xk−m+1 ), k ≥ m. Then (6.1) can be written as yk = Aq¯ yk−1 . Therefore, yk = Ak−m · · · A1 y1 , where we use a short notation Aj = Aq¯j . For any norm in Rm , the largest value Ak−m · · · A1 y1 over all matrices A1 , . . . , Ak−m ∈ M is asymptotically equivalent to [σ ∗ (M)]k−m , and the smallest value is equivalent to [σ∗ (M)]k−m [16]. Therefore, the largest and the smallest xk are asymptotically proportional to these values and hence to the values [σ ∗ (M)]k and [σ∗ (M)]k , respectively. On the other hand, the family M has the product structure with one uncertainty set F1 = Q for the first row (the other rows of the matrices are fixed). Applying now Theorem 4, we conclude the proof. Thus, in the case of the nonnegative uncertainty set of coefficients for (6.1), no mixing strategy, based on different vectors q¯, is optimal. The fastest growth of the sequence {xk } is attained at a stationary sequence, when we take the same vector of coefficients every time. This vector corresponds to the maximal spectral radius of matrices A ∈ M and can be found by the technique discussed in section 5. The rate of growth λ1 can be found by solving optimization problem (2.1). For the slowest growth, we can apply a similar technique. It is interesting that in our situation the problems (2.1) and (2.2) become very simq ) = (λm−1 (¯ q ), . . . , ple. Note that the Perron–Frobenius eigenvector of matrix Aq¯ is v(¯ λ(¯ q ), 1), where λ(¯ q ) = ρ(Aq¯) is the only positive root of the characteristic polynomial
1012
Y. NESTEROV AND V. Y. PROTASOV
m def m−j pq¯(λ) = λm − (we assume q¯ = 0). Indeed, dividing the equation j=1 qj λ pq¯(λ) = 0 by λm , we get an equivalent equation m
qj λ−j = 1
j=1
with strictly decreasing left-hand side. Lemma 5. At the same time, function λ(¯ q ) is quasi-convex and quasi-concave in q¯ ≥ 0. −j ≤ 1. Thus, the level Proof. Let λ(¯ q ) ≤ t. This happens if and only if m j=1 qj t sets of this function are convex. Hence, λ(¯ q ) is quasiconvex. The quasiconcavity of function λ(¯ q ) can be proved in a similar way. Recall that the polynomial pq¯(·) is called stable if λ(¯ q ) ≤ 1. m Corollary 3. Polynomial pq¯(·) with q¯ ≥ 0 is stable if and only if j=1 qj ≤ 1.
q2 Example 2. For m = 2, we have λ(¯ q ) = q21 + q2 + 41 . It is easy to check that 2 {¯ q ∈ R+ : λ(¯ q ) ≤ 1} ≡
2 {¯ q ∈ R+ : q1 + q2 ≤ 1}.
For m > 2, such a verification is less straightforward. Thus, in our situation, both problems (2.1) and (2.2) can be solved by the standard tools of convex optimization. 6.2. Optimizing the spectral radius for graphs. One of the classical problems of spectral graph theory consists of finding the largest (or smallest) possible spectral radius of a graph under some conditions (level of connectivity, bounds for the weighted sums of edges and the largest degrees of some vertices, etc.) The spectral radius of a graph is by definition the spectral radius of its adjacency matrix A (Aij = 1 if there is an edge from the vertex j to the vertex i; otherwise Aij = 0). Most of the results on optimizing the graph spectral radius are related to undirected graphs, which have symmetric adjacency matrices [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. At the same time, there exist well-known results on directed graphs, related to maximization of spectral radius of arbitrary 0 − 1-matrices [2, 6]. Since adjacency matrices are nonnegative, we can apply our results both to maximizing and minimizing the spectral radius of the graphs. In both cases, we get an approximate optimal value of our problem. However, if the set of graphs possesses a product structure property (either by rows or by columns), we can approach an exact optimal value. We give two examples of problems for undirected graphs, which can be solved numerically by applying our technique. Problem 1. Consider the set of vertices g1 , . . . , gn . For each vertex gi we are given a finite family Fi of subsets of these vertices. Consider the set of all directed graphs such that the set of incoming edges (more precisely, the set the corresponding adjacent vertices) for any vertex gi belongs to Fi , i = 1, . . . , n. The problem is to compute the largest and the smallest spectral radii for graphs of this set. Problem 2. We are given a set of vertices g1 , . . . , gn and a set of nonnegative integers d1 , . . . , dn . The problem is to find graphs with the largest and the smallest spectral radii among all directed graphs such that the number of incoming edges for any vertex gi equals to di , i = 1, . . . , n. Replacing rows by columns, we obtain similar problems for the sets of outgoing edges instead of incoming edges. The set of adjacency matrices in both cases has a product structure. For the first problem all uncertainty sets Fi are finite; for
OPTIMIZING THE SPECTRAL RADIUS
1013
the second they are polyhedral and given by the system of inequalities Fi = x ∈ n Rn k=1 xk ≤ di , 0 ≤ xk ≤ 1, k = 1, . . . , n . The minimal and maximal spectral radii are both attained at extreme points of the uncertainty sets (Corollary 1), i.e., precisely when the ith row has di entries equal to one and all other entries are zeros. To conclude with examples, let us mention that there is a wide area of applications of our results related to Markov processes in graphs. Acknowledgments. This research was carried out when the second author was visiting CORE and the Universit´e Catholique de Louvain, Belgium. The author would like to thank both institutions for their hospitality. Finally, we would like to thank the two anonymous referees for their careful reading and very useful suggestions. REFERENCES [1] V. D. Blondel and Yu. Nesterov, Polynomial-time computation of the joint spectral radius for some sets of nonnegative matrices, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 865– 876. [2] R. A. Brualdi and A J. Hoffman, On the spectral radius of (0, 1)-matrices, Linear Algebra Appl., 65 (1985), pp. 133–146. [3] D. G. Cantor and S. A. Lippman, Optimal investment selection with a multiple of projects, Econometrica, 63 (1995), pp. 1231–1240. ´, M. Doob, and H. Sachs, Spectra of Graphs. Theory and Application, Aca[4] D. Cvetkovic demic Press, New York, 1980. [5] Y.-Z. Fan, B. S. Tam, and J. Zhou, Maximizing spectral radius of unoriented Laplacian matrix over bicyclic graphs of a given order, Linear Multilinear Algebra, 56 (2008), pp. 381– 397. [6] S. Friedland, The maximal eigenvalue of 0-1 matrices with prescribed number of ones, Linear Algebra Appl., 69 (1985), pp. 33–69. [7] Y. Hong, J. L. Shu, and K. Fang, A sharp upper bound of the spectral radius of graphs, J. Combin. Theory Ser. B, 81 (2001), pp. 177–183. [8] V. Kozyakin, A short introduction to asynchronous systems, in Proceedings of the Sixth International Conference on Difference Equations (Augsburg, Germany 2001): New Progress in Difference Equations, B. Aulbach, S. Elaydi, and G. Ladas, eds., CRC Press, Boca Raton, FL, 2004, pp. 153–166. [9] M. G. Kre˘in and M. A. Rutman, Linear Operators Leaving Invariant a Cone in a Banach Space, Amer. Math. Soc. Transl. 26, AMS, Providence, RI, 1950. [10] W. Leontief, Input-Output Economics, 2nd ed., Oxford University Press, New York, 1986. [11] B. Liu, On an upper bound of the spectral radius of graphs, Discrete Math., 308 (2008), pp. 5317–5324. [12] D.O. Logofet, Markov chains as succession models: New perspectives of the classic paradigm, Lesovedenie, 2 (2010), pp. 46–59. [13] K. S. Miller, Linear Difference Equations, Elsevier, Amsterdam, 2006. [14] D. D. Olesky, A. Roy, and P. van den Driessche, Maximal graphs and graphs with maximal spectral radius, Linear Algebra Appl. 346 (2002), pp. 109–130. [15] B. G. Pachpatte, Integral and Finite Difference Inequalities and Applications, W. A. Benjamin, Inc., New York, Amsterdam, 1968. [16] V. Yu. Protasov, Asymptotics of the partition function, Sb. Math., 191 (2000), pp. 381–414. [17] V. Yu. Protasov, When do several linear operators share an invariant cone?, Linear Algebra Appl., 433 (2010), pp. 781–789. [18] V. Yu. Protasov, R. M. Jungers, and V. D. Blondel, Joint spectral characteristics of matrices: A conic programming approach, SIAM J. Matrix Anal. Appl., 31 (2010), pp. 2146– 2162. [19] L. Rodman, H. Seyalioglu, and I. M. Spitkovsky, On common invariant cones for families of matrices, Linear Algebra Appl., 432 (2010), pp. 911–926. [20] A. G. Vladimirov, N. Grechishkina, V. Kozyakin, N. Kuznetsov, A. Pokrovskii, and D. Rachinskii, Asynchronous systems: Theory and practice, Inform. Processes, 11 (2011), pp. 1–45. [21] M. Zhai, R. Liu, and J. Shu, On the spectral radius of bipartite graphs with given diameter, Linear Algebra Appl., 430 (2009), pp. 1165–1170.