Adaptive Randomized Algorithm for Finding ... - Semantic Scholar

2 downloads 41536 Views 291KB Size Report
PageRank problem of ranking web pages, which is the basis for Google's search engine rankings, see the original paper ..... It is easy to check that this function ...
Joint 48th IEEE Conference on Decision and Control and 28th Chinese Control Conference Shanghai, P.R. China, December 16-18, 2009

WeA04.4

Adaptive Randomized Algorithm for Finding Eigenvector of Stochastic Matrix with Application to PageRank Alexander Nazin and Boris Polyak Abstract— The problem of finding the eigenvector corresponding to the largest eigenvalue of a stochastic matrix has numerous applications in ranking search results, multi-agent consensus, networked control and data mining. The well-known power method is a typical tool for its solution. However randomized methods could be competitors vs standard ones; they require much less calculations for one iteration and are well-tailored for distributed computations. We propose a novel adaptive randomized algorithm and provide  an explicit upper bound for its rate of convergence O( ln N/n), where N is the dimension and n is the √ number of iterations. The bound looks promising because ln N is not large even for very high dimensions. The proposed algorithm is based on the mirrordescent method for convex stochastic optimization.

I. INTRODUCTION Let A be a stochastic N × N matrix, i.e., its columns are contained in the standard simplex Θ N . Then it has the largest eigenvalue equal to 1 and a real eigenvector x ∗ ∈ ΘN corresponding to this eigenvalue, i.e. Ax ∗ = x∗ . Finding this eigenvector is a basic problem in numerical analysis and in numerous applications arising in ranking search results, multi-agent consensus, networked control and data mining; see, e.g., [1], Section 5. It suffices to mention the famous PageRank problem of ranking web pages, which is the basis for Google’s search engine rankings, see the original paper [2] and recent monograph [3] where numerous references can be found. What is typical for such applications is their high dimension. For instance, for web page ranking N equals to several billions. The standard technique for solving the problem is based on the power method [3]. Starting from initial approximation x 0 this method calculates iteratively xn+1 = Axn , and under some assumptions (e.g. if A has all positive entries) limn→∞ xn = x∗ , while x∗ is unique in this case. The method is very simple and converges with geometric rate, see details in [3]. Calculation of Ax n can be performed easily because in typical applications A is sparse. Nevertheless, for huge N this method requires serious computational efforts; it is reported that for web PageRank computation on supercomputers takes about a week. Having this in mind, several approaches have been recently proposed. Some of them exploit the block structure of the matrix, others use distributed calculations with several computers, and so on. Another line of research on this topic is based on randomized algorithms, see the recent work [1], [4] and references therein. Such approach has A. Nazin and B. Polyak are both with the Laboratory for Adaptive and Robust Control Systems, Institute of Control Sciences RAS, 65 Profsoyuznaya str., 117997 Moscow, Russia. E-mails: [email protected] and

[email protected]

978-1-4244-3872-3/09/$25.00 ©2009 IEEE

several advantages. First, each iteration is much cheaper computationally than that of the power method (see Remark 1 below). Second, these algorithms can be easily adopted for parallel calculations. Moreover, randomized algorithms play significant role in modern control and optimization approaches, compare [5]. In the present paper we follow this line of research. We propose an iterative randomized method for minimization of Ax− x 22, the squared Euclidean norm. At each step it calculates a stochastic gradient of this function, which exploits one row and one column of the matrix A, chosen randomly according to values x n considered as probability vectors. The method is a version of the general stochastic mirror-descent method (or primal-dual convex stochastic optimization method) where the idea goes back to [6] and developed √ in [7], [8], and [9]. We get the bound (n+1) lnN

, where xn is the result obtained EA xn − xn 22 ≤ 6 n th at n iteration. The dependence of dimension √ N is highly promising — even for N = 10 9 we have ln N < 5 — a small enough quantity. Moreover, the bound is valid for all stochastic matrices, it does not depend on the properties of a particular A. In particular, it does not depend on the second eigenvalue of the matrix, while for the power method it does. Thus one can avoid transformation of the matrix A to matrix M (see Section 7 below) by use of parameter m. It is of interest to compare our bound with the one in (30) [1], which reads in our notations as E x n − x∗ 22 = O(N 3/2 /n). In this case the dependence on n is better, however dependence on dimension looks hopeless: O(N 3/2 ). It is worth to emphasize that our bound holds under very mild assumptions; for instance, we do not suppose the uniqueness of x ∗ ∈ ΘN . The paper is organized as follows. Problem formulation is given in Section 2; some preliminary constructions are provided in Section 3. Next Section 4 contains the description of the algorithm proposed and the main result on its convergence rate. The sketch of the proof is given in Section 5. Discussion on the numerical implementation of the algorithm, its applications to PageRank can be found in Sections 6, 7.

The previous versions on the non-adaptive randomized algorithm have been provided in [10], [11], [12]. These versions treat the algorithm with fixed parameters, while here we consider adaptive estimation strategy and prove the improved upper bound.

127

WeA04.4 II. PROBLEM STATEMENT

Moreover, the gradient for function (5) is

Let a stochastic matrix A = a i j N×N be given by its stochastic columns, i.e. the A ( j) = (a1 j , . . . , aN j )T , N

∑ ai j = 1,

j = 1, . . . , N;

i=1

∀ ai j ≥ 0 .

(1)

Denote by A N the set of all stochastic N× N matrices. Together with the columns A ( j) we also define the A matrix rows A(i) = (ai1 , . . . , aiN ), i = 1, . . . , N. Introduce the standard simplex ΘN ⊂ RN containing all vectors θ = (θ 1 , . . . , θN )T , which satisfy the conditions N

∑ θi = 1,

∀ θi ≥ 0 .

i=1

(2)

∇Q(x) = (A − I)T (A − I)x = AT Ax − AT x − Ax + x .

Choose the second index ξ k ∈ {1, . . . , N} at random by the conditional probability distribution (a 1ηk , . . . , aN ηk ), i.e. using the stochastic vector A (ηk ) : P (ξk = i | xk , ηk ) = aiηk ,

x ∈ RN .

(3)

E (ζk | xk )

A. Stochastic gradient The idea to construct stochastic realization of Ax by generating A ( j) with probability x ( j) is proposed in [9]. We exploit this idea adopting it to our problem. Let estimate (1) (N) xk = (xk , . . . , xk )T ∈ ΘN be obtained on iteration k. Then the vector N

( j)

(6)

j=1

may be treated as a conditional expectation of the vector A(ηk ) under a random index η k ∈ {1, . . . , N} with a condi(1) (N) tional probability distribution (x k , . . . , xk ), i.e. ( j)

P (ηk = j | xk ) = xk ,

j = 1, . . . , N .

(13) (14) (15)

i=1

N T ( j)  + xk − ∑ xk [ A( j) + A( j) ]

(16)

j=1

N

=

N

( j)

∑ ∑ xk

i=1 j=1 T

 T ai j A(i) + xk − AT xk − Axk

= (A A − AT − A + I) xk = ∇Q(xk ) .

(17) (18)

Observe a significant property of the stochastic gradient ζk in (10): its ∞-norm is bounded by 2, indeed T  T  ζk ∞ ≤  A(ξk ) − A(ηk ) ∞ + xk − A(ηk ) ∞ ≤ 2 . (19) B. Proxy function

III. PRELIMINARY CONSTRUCTIONS

∑ A( j)xk

(12)

= E {E (ζk | xk , ηk ) | xk }    T = E E[ A(ξk ) | xk , ηk ] | xk  T + xk − E[( A(ηk ) + A(ηk ) ) | xk ]   N  T = E ∑ aiηk [ A(i) | xk , ηk ] | xk

x∈ΘN

Axk =

(11)

Indeed, the LHS of the above equality is represented as follows in detail (by using the conditional expectation properties and (7), (9) ):

(4)

If matrix A is strongly connected (or equivalently irreducible), the set X∗ is a singleton, i.e. the eigenvector x ∗ corresponding to the largest in absolute eigenvalue (equal to one) is unique. Under stronger assumption of positiveness of all entries of A the convergence of the power method can be guaranteed. For instance, the power method does not converge for N = 2, a 11 = a22 = 0, a21 = a12 = 1. We consider the general situation without assumptions of irreducibility or positivity of A. Our goal is to find a point in X ∗ that minimizes 1 (5) Q(x) = Ax − x22, x ∈ ΘN . 2 The idea of the iterative algorithm for minimization of Q(x) is to construct stochastic gradient of this function by use of random choice of rows and columns of A and then to apply the mirror descent method for minimization in Θ N .

(9)

Consequently, we have

E (ζk | xk ) = ∇Q(xk ) .

By the Perron–Frobenius theorem, there exists at least one solution x∗ ∈ ΘN of the system (3). The set X∗ of all such solutions generates a convex compact X∗ = Argmin Ax − x22 = {x ∈ ΘN : Ax = x} .

i = 1, . . . , N .

Thus, by sequential choice of two indexes η k and ξk at random, we form the realization of stochastic gradient at the current iteration, i.e.  T T  (10) ζk  A(ξk ) − A(ηk ) − A(ηk ) + xk .

Consider a system of linear equations Ax = x ,

(8)

(7)

Remind some required definitions and facts from convex analysis [13]; see also [8] where we adopted material from. Introduce E = ℓ N1 , the primal space R N , equipped by the norm z1 = ∑Nj=1 |z( j) |, where z = (z(1) , . . . , z(N) )T . Denote by E ∗ = ℓN∞ the dual space, i.e. R N being equipped by the dual norm z∞ = maxθ 1 =1 zT θ = max1≤ j≤N |z( j) | , z ∈ E ∗ . Let Θ be a convex closed subset of E. Introduce a parameter β > 0 and a convex function V : Θ → R. We call β -conjugate function to V , the Legendre–Fenchel transform of β V :

∀ z ∈ E ∗ , Wβ (z) = sup −zT θ − β V (θ ) . (20) θ ∈Θ

Further, we introduce the key assumption (Lipschitz condition in conjugate norms  ·  1 and  · ∞ ) that will be used in the proof of Theorem 1.

128

WeA04.4 Assumption (L). A convex function V : Θ → R is such that its β -conjugate Wβ is continuously differentiable on E ∗ and its gradient ∇Wβ satisfies the inequality ∇Wβ (z) − ∇Wβ ( z˜ )1 ≤

1 z − z˜∞ , αβ

∀ z, z˜ ∈ E ∗ , β > 0,

where α > 0 is a constant independent of β .

Wβ and ∇Wβ are given by the following formulas: for all z ∈ E ∗, 1 N −z(k) /β Wβ (z) = β ln , (23) ∑e N k=1 −1 N ∂ Wβ (z) ( j) (k) = −e−z /β ∑ e−z /β , ∀ j . (24) ∂ z( j) k=1

As is known (see, e.g., [13], [14]), this assumption is related to the notion of strong convexity w.r.t.  ·  1-norm. Definition 1: Fix α > 0. A convex function V : Θ → R is said to be α -strongly convex with respect to the norm  ·  1 if

α s(1 − s)x − y21 2 (21) for all x, y ∈ Θ and any s ∈ [0, 1].  The following proposition sums up some properties of β conjugates and, in particular, yields a sufficient condition for Assumption (L). Proposition 1: Let function V : Θ → R be convex and parameter β be positive. Then, the β -conjugate W β of V has the following properties.

Note that the following holds true: – the proxy function (22) equals to the Kullback-Leibler information divergence between the uniform distribution on the set {1, . . . , N} and the distribution on the same set defined by probabilities θ ( j) , j = 1, . . . , N, – in view of (24), the components of the vector −∇W β (z) define a Gibbs distribution on the coordinates of vector z with β being interpreted as a temperature parameter.

V (sx + (1 − s)y) ≤ sV (x) + (1 − s)V(y) −

1) The function Wβ : E ∗ → R is convex and has a conjugate β V , i.e.,

∀ θ ∈ Θ, β V (θ ) = sup −zT θ − Wβ (z) . z∈E ∗

2) If function V is α -strongly convex w.r.t. the norm  · 1, then (i) Assumption (L) holds true,

(ii) argmaxθ ∈Θ −zT θ − β V (θ ) = −∇Wβ (z) ∈ Θ .  For a proof of this proposition we refer to [13], [14]. Definition 2: We call a function V : Θ → R + proxy function if it is convex, and

IV. ADAPTIVE RANDOMIZED ALGORITHM A. General form The algorithm below can be constructed for various proxy functions, however for the sake of clearness and simplicity, we treat only the function of entropy type (22) and the related Gibbs potential (24). Let xk be a current auxiliary estimate at the step iteration k ≥ 0 defined by a stochastic gradient u k+1 = uk+1 (xk ), and introduce two variables: (a) ψ k being defined by the stochastic gradients u k+1 = uk+1 (xk ) , as the result of descent in dual space E ∗ , (b) xk representing itself the “mirror image” of ψk into the primal space. In order to properly adjust the algorithm, it is essential to fix two positive sequences: (γk )k≥0 (a step gain) and (β k )k≥0 (“temperature”) with β k ≥ βk−1 , ∀ k ≥ 1. Let a stochastic matrix A ∈ A N be given. The mirror descent algorithm with the entropy proxy function (22) and the related Gibbs potential ∇Wβ (·) is defined as follows: •

(i) there exists a point θ ∗ ∈ Θ such that

Fix the initial values x0 ∈ ΘN and ψ0 = 0 ∈ RN . Specify the positive sequence (γ k )k≥1 , the initial parameter

min V (θ ) = V (θ∗ ),

β0 = (2 lnN)−1/2 ,

θ ∈Θ



(ii) Assumption (L) holds true.



C. Example Let Θ = ΘN . Consider the entropy type proxy function: N

∀ θ ∈ ΘN ,

V (θ ) = ln N + ∑ θ ( j) ln θ ( j) ,

(22)

j=1

(where 0 ln0  0) which has a single minimizer θ ∗ = (1/N, . . . , 1/N)T with V (θ∗ ) = 0. It is easy to check that this function is α -strongly convex w.r.t. the norm  ·  1 , with parameter α = 1. Hence, Assumptions (L) holds true by Proposition 1 (e.g., see Appendix in [8] for the direct proof). An important property of this choice of V is that the optimization problem (20) can be solved explicitly so that

129

(25)

and define horizon n > 1. At each k = 0, . . . , n− 1, given x k and ψk , one draws two random indexes η k and ξk , related to the conditional distributions (7) and (9), and calculate the realization for the stochastic gradient u k+1 (xk ) = ζk (10), i.e. T  T  uk+1 (xk ) = A(ξk ) − A(ηk ) − A(ηk ) + xk ; (26)

then implement a recursive calculation

ψk+1 xk+1

= =

βk+1

=

ψk + γk uk+1 (xk ) , −∇Wβ (ψk+1 ) , k

1/2 u (x )2 2 βk + k+1 k ∞ . ln N

Equations (23)–(24) are applied here.

(27)

WeA04.4 •

The n-th iteration defines the convex combination, the estimate xn at the horizon n, that is −1 n

n

∑ γk xk .

∑ γk

xn =

(28)

k=0

k=0

( j)

Note that the entries xk of vector x k in (27) may be represented in non-recursive way, i.e. ( j)

xk

exp(−ψk /βk )

=

N

∑ exp(−ψt /βt )

t=1

exp =



N

∑ exp

t=1

−βk−1



k





( j) γm um (xm−1 )

m=1

−βt−1

t



(k) γm um (xm−1 )

m=1

EA xn − xn 22

( j)

B. Choice of parameters Now specify the parameter sequence

γk ≡ 1 .

(29)

Thus the algorithm takes the form: ∀ k = 0, 1, . . . , n,

xk+1

βk+1

C. Main Result Theorem 1: Let N ≥ 2 and the estimate xn be defined by the randomized algorithm (27)–(33) with the stochastic gradient (26). Then for any A ∈ A N and n ≥ 1  n−1 3(ln N)1/2 2 E 2 + ∑ ut+1 (xt )2∞ . (37) E A xn − xn2 ≤ n t=0 Consequently, by inequality (19),



where um (x) is the j-entry in vector u m (x), j = 1, . . . , N.

ψk+1 xk+1

initial β0 > 0 . As one can see below in the proof, the risk upper bound lnN O(E βk ) E A xn − xn 22 ≤ n can be improved in Theorem 1 below, at least potentially. 

= ψk + uk+1 (xk ) ,

(30)

= −∇Wβk (ψk+1 ) , 1 ( xk − xk ) , = xk − k+1

1/2 u (x )2 = βk2 + k+1 k ∞ , ln N

(31) (32) (33)

with the initial values ψ0 = 0 , x0 ∈ ΘN and with β0 from (25). Equations (23)–(24) are applied into (31). Notice that the recursive form (32) does not need to know the horizon n in advance, the algorithm becomes completely recursive. In our papers [10], [11], [12] we studied non-adaptive version √ of the algorithm where β k = β0 1 + k was chosen instead of (33). Remark 1: Introduce

=

βk2 + ck +



βk2 + ck ,

(38)

where random vectors u k (x) are the stochastic gradients. Recall the key property (19), i.e. Eu k+1 (xk )2∞ ≤ 4 in (19), (26) and (30), which is used in the proof below. Proposition 2: For any x ∈ Θ N and any integer t ≥ 1 the following inequality holds t

∑ γi (xi−1 − x)T ∇Q(xi−1 )

i=1

t

≤ βt V (x) − β0V (θ∗ ) − ∑ γi (xi−1 − x)T ϕi (xi−1 ) i=1

t

∀ k ≥ 0,

(34)

c2k 4βk2

(35)

implies 2 βk+1

1+n . n

V. SKETCH OF THE PROOF We first refer to [8] for the auxiliary Propositions 2 and 3 proved under more general assumptions there in; c.f. also [9]. Introduce 1 Q(x)  Ax − x22, 2 ∇Q(x) = E uk (x), ϕk (x) = uk (x) − ∇Q(x), ∀ x ∈ ΘN ,

The recursive algorithm, c.f. [15], ck , 2βk



 Notice that obviously Ax − x 22 ≤ 4 for all x ∈ ΘN , thus the RHS of (37)–(38) could be improved by taking the minimum of it and four. Remark 2: One can verify that each iteration in algorithm (27) requires O(N) arithmetical operations (provided that A is dense), as is seen from (10) and (24). It is in contrast with O(N 2 ) operations per iteration for the power method. 

uk+1 (xk )2∞ . ck  ln N

βk+1 = βk +

≤ 6(ln N)

1/2

∀ k ≥ 0.

(36)

Hence, the recursive algorithm (33) generates the sequence (βk )k≥0 which is less than that of (34) subject to the same

γi2 +∑ ui (xi−1 )2∞ . i=1 2αβi−1 Here θ∗ = argminθ ∈ΘN V (θ ).  Proposition 3: For any x ∈ Θ N and any integer t ≥ 1 the following inequality holds true:  βt V (x) − β0V (θ∗ ) E Q( xt ) ≤ E Q(x) + ∑ti=1 γi  −1 t t 2 2 γ ui (xi−1 ) + ∑ γi ∑ i 2αβi−1 ∞  . i=1 i=1

130

WeA04.4 Hence the expected accuracy of the estimate xt satisfies the following upper bound:   t 2 u (x 2 γ ) 1 i i−1 ∞ E βt V (x∗Q ) − β0V (θ∗ ) + ∑ i E Q( xt ) ≤ t 2αβi−1 i=1 ∑ γi i=1

VI. SOME COMMENTS There are several ways to modify the algorithm to improve its convergence. 1. We can randomly generate n k ∈ N+ independent realizations of the stochastic gradient ζ k (t), t = 1, . . . , nk , (10) at each iteration k and calculate their arithmetic means

(39) 

where x∗Q ∈ Argmin Q(x), V (θ∗ ) = minθ ∈ΘN V (θ ).

1 nk ζ¯k = ∑ ζk (t) nk t=1

x∈ΘN

Let us prove Theorem 1. Proof: Observe that V (θ ∗ ) = 0 in (22), Q(x ∗Q ) = 0 and ∗ V (xQ ) ≤ maxx∈ΘN V (x)  V ∗ = ln N. Applying (39) under γk ≡ 1 and     k k  2  2u (x ) i i−1 ∞  2 = β02 + ∑ ci βk = β0 + ∑ ln N i=1 i=1 with β0 defined by (25) and ck 

4 uk (xk−1 )2∞ ≤ , ln N ln N

we obtain n uk (xk−1 )2∞ 1 E βn ln N + ∑ n βk−1 k=1 n ck ln N E βn + ∑ . n k=1 βk−1

E Q( xn )) ≤ = So, n

ck ∑ βk−1 k=1

n

=

ck ∑ βk + k=1 n



∑

k=1

n

∑ ck

k=1

1

βk−1

ck

+

β02 + ∑ki=1 ci



 ∑n ci i=1

=

2βn − 2β0 +

0

1 − βk

≤ ≤

for all x ∈ ΘN .  However, the constant c depends on A (and in particular on the second in absolute eigenvalue of A).

4 β0 ln N

VII. APPLICATION TO PAGERANK

dv 4  + β ln N 2 0 β0 + v

4 = 2β n , β0 lnN

 n 3 ln N E β02 + ∑ ct n t=1  √ n−1 3 ln N E 2 + ∑ ut+1 (xt )2∞ . n t=0

We earlier applied the following assertion: the entropy proxy function (22) attains maximum at each vertex of simplex Θ N , that is V ∗ = max V (x) = ln N . x∈ΘN

The Theorem is proved.

in order to use the arithmetic means ζ¯k as more precise estimates of the gradients (18). This can be a source for more effective algorithms and a basis for distributed versions of the method. 2. There are also other approaches to stochastic gradient generation. For instance, we can deal not with the quadratic function Q(x), but with other functions. 3. The choice of parameters, suggested in IV-B, is not the only possible one. More flexible strategy can lead to faster convergence. 4. Sometimes the better estimates for u t+1 (xt ) = ζt than (19) are available, thus the estimate (37) can be better than (38). Another problem of interest is the rate of convergence to the eigenvector x ∗ , even being unique. Indeed, we have the estimate for E A xn − xn 22 , but what can one say about 2  xn − x∗ 2 ? The following proposition (which we provide without proof) clarifies the situation. Proposition 4: Let stochastic matrix A be irreducible. Then for a positive constant c = c(A) > 0 it has the unique eigenvector x ∗ ∈ ΘN and Ax − x22 ≥ cx − x∗22



and we obtain E Q( xn ))

(40)

The famous PageRank problem [1], [2], [3], [4] can be treated in the framework of the above problem statement, however some details are worth mentioning. The initial link matrix A represents the web graph with N nodes (pages). Its entries are ai j , where ai j = 1/n j if page j has an outgoing link to page i (n j being the total number of outgoing links of j) and 0 otherwise. The desired ranks x ( j) of the pages satisfy the equation Ax = x. The matrix A may be nonstochastic due to existence of dangling nodes, i.e. the pages having no outgoing links. To avoid this difficulty the matrix is redefined, ai j = 1/N, i = 1, ..., N for all dangling nodes j. Now matrix A becomes stochastic, and eigenvector x ∗ ∈ ΘN , i.e. Ax∗ = x∗ , always exists. However it may be non-unique. Traditionally to overcome this difficulty researchers deal with transformed matrix m M = (1 − m)A + S N where 0 < m < 1 is a parameter while S is the matrix with all entries equal to 1. The resulting matrix M has all positive

131

WeA04.4 entries, x∗ is unique and the power method can be applied. Its rate of convergence is x n − x∗ 2 ≤ c(1 − m)n , see [3]. As it was proposed in [2], m = 0.15 is usually taken. The researches confirm that dependence on the parameter m can be strong enough, [3], and it is not obvious what is the sense of the solution x ∗ , corresponding to m which is not small (like m = 0.15). The benefit of the approach presented in our paper is the opportunity to deal with small m > 0 or even with m = 0, when the power method converges slowly or even diverges. Matrix A for PageRank problems is very sparse, moreover just link graph should be stored. This allows to implement the proposed method effectively. The results of numerical simulation are rather preliminary. We tested the method for models of PageRank problems with N of order 1000–10000. Such problems could be solved on standard PC. It was hard to obtain high accuracy of the solution; however in real-life PageRank only the pages with relatively high rank are of interest, and these ranks were reconstructed relatively well. More detailed results of simulation will be provided in later publications.

[9] A. Juditsky, G. Lan, A. Nemirovski, and A. Shapiro, Stochastic Approximation Approach to Stochastic Programming, 2007. http://www.optimization-online.org/DB HTML/2007/09/1787.html [10] A.V. Nazin, Solution to a Particular Deterministic Problem in Finite High Dimension by the Recursive Randomized Mirror Descent Algorithm, 8th Meeting on Mathematical Statistics, CIRM, Luminy, France, 2008. http://www.cirm.univ-mrs.fr [11] A.V. Nazin and B.T. Polyak, The Randomized Algorithm for Finding an Eigenvector of the Stochastic Matrix with Application to PageRank, Doklady Mathematics, 79(3):424–427, 2009. [12] A.V. Nazin and B.T. Polyak, A Randomized Algorithm for Finding Eigenvector of Stochastic Matrix with Application to PageRank Problem, 3rd IEEE Multi-conference on Systems and Control (MSC 2009), Saint-Petersburg, Russia, July 8-10, 2009. [13] R.T. Rockafellar and R.J.B. Wets, Variational Analysis, New York, Springer; 1998. [14] A. Ben-Tal and A.S. Nemirovski, The Conjugate Barrier Mirror Descent Method for Non-Smooth Convex Optimization, MINERVA Optim. Center Report., Haifa, Faculty of Industrial Engineering and Management, Technion – Israel Institute of Technology, 1999. http://iew3.technion.ac.il/Labs/Opt/opt/Pap/CP MD.pdf [15] Claire Tauvel. Optimisation stochastique a` grande e´ chelle. PhD thesis, Universite Joseph Fourier, December 2008.

VIII. CONCLUSIONS The adaptive randomized method for finding eigenvector corresponding to eigenvalue 1 for stochastic matrices has been proposed. The upper bound of its rate of convergence is of non-asymptotic type and has the explicit factor, see Theorem 1. Moreover, the bound is valid for the whole class of stochastic matrices and does not depend on properties of the individual matrix. The method can be applied for PageRank computation with small parameter m. Further work on acceleration of the method is possible. IX. ACKNOWLEDGMENTS The authors are grateful to Roberto Tempo, who attracted our attention to PageRank problem, to Arkadi Nemirovski for very important ideas on randomized methods, to Anatoli Juditsky for providing some references, and to Elena Gryazina who performed the calculations. R EFERENCES [1] H. Ishii and R. Tempo, A distributed randomized approach for the PageRank computation, 1, Proc. 47th IEEE Conf. on Decision and Control, 3523–3528, 2008. [2] S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine, Comput. Netw. ISDN Syst., 30(1-7):107–117, 1998. [3] A.N. Langville and C.D. Meyer, Google’s PageRank and Beyond: The Science of Search Engine Rankings, Princeton University Press; 2006. [4] H. Ishii and R. Tempo, A distributed randomized approach for the PageRank computation, 2, Proc. 47th IEEE Conf. on Decision and Control, 3529–3534, 2008. [5] R. Tempo, G. Calafiore, and F. Dabbene, Randomized Algorithms for Analysis and Control of Uncertain Systems, Springer, London; 2005. [6] A.S. Nemirovski and D.B. Yudin, Problem Complexity and Method Efficiency in Optimization, New York, Wiley-Interscience; 1983. [7] Yu. Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, 2007. DOI: 10.1007/s10107-007-0149-x. [8] A.B. Juditsky, A.V. Nazin, A.B. Tsybakov, and N. Vayatis, Recursive aggregation of estimators by the mirror descent algorithm with averaging. Problems of Information Transmission, 41(4):368–384, 2005.

132

Suggest Documents