Optimizing the Number of Robots for Web Search Engines

Optimizing the Number of Robots for Web Search Engines J. Talim

Z. Liuy

P. Nain z

E. G. Coman, Jr.x

Published in Telecommunication Systems Journal

Vol. 17, Nos 1-2, pp. 234-243, May-June 2001 Abstract

Robots are deployed by a Web search engine for collecting information from dierent Web servers in order to maintain the currency of its data base of Web pages. In this paper, we investigate the number of robots to be used by a search engine so as to maximize the currency of the data base without putting an unnecessary load on the network. We use a queueing model to represent the system. The arrivals to the queueing system are Web pages brought by the robots; service corresponds to the indexing of these pages. The objective is to nd the number of robots, and thus the arrival rate of the queueing system, such that the indexing queue is neither starved nor saturated. For this, we consider a nite-buer queueing system and dene the cost function to be minimized as a weighted sum of the loss probability and the starvation probability. Under the assumption that arrivals form a Poisson process, and that service times are independent and identically distributed random variables with an exponential distribution, or with a more general service function, we obtain explicit/numerical solutions for the optimal number of robots to deploy.

Keywords: Web search engines; Web robots; Queues.

Department of Mathematics and Statistics, University of Saskatchewan, 106 Wiggins Road, Saskatoon SK S7N

5E6, Canada; [email protected] y IBM Research, 30 Saw Mill River Roadm Hawthorne, NY 10532, USA z INRIA, B.P. 93, 06902 Sophia Antipolis, France [email protected] x Electrical Engineering Department, Columbia University, 530 W. 120th St., New York, NY 10027, USA; [email protected]

1

1 Introduction The World Wide Web has become a major information publishing and retrieving mechanism on the Internet. The amount of information as well as the number of Web servers has been growing exponentially fast in recent years. In order to help users nd useful information on the Web, search engines such as Alta Vista, HotBot, Yahoo, Infoseek, Magellan, Excite and Lycos, etc. are available. These systems consist of four main components: a database that contains web pages (full text or summary), a user interface that deals with queries, an indexing engine that updates the database, and robots that traverse the Web servers and bring Web pages to the indexing engine. Thus, the quality of a search engine depends on many factors, e.g., query response time, completeness, indexing speed, currency, and ecient robot scheduling. Our interest here focuses on the function served by robots: establishing currency by bringing new pages to be indexed and bringing changed/updated pages for re-indexing. We investigate the problem of choosing the number of robots to meet the conicting demands of low network trac and an up-to-date data base. In the classical approach of such investigations, we begin by introducing a reference model suciently simple that we can obtain explicit formulas, but suciently powerful to capture fundamental behavior, in particular the interplay among system parameters. The specic model, illustrated in Figure 1, centers on the indexing engine, which is represented by a nite, single-server queue/buer, and multiple identical robots acting independently as sources of arriving pages. When a robot arriving with a page for the indexing buer nds the buer full, the page being delivered is lost, at least temporarily. In this situation, a potential update or new page has been lost, and network congestion has been created to no benet. On the other hand, if the buer is ever empty, and hence the indexing engine is idle, data base updating is at a standstill waiting for the robots to bring more pages. To reduce the probability of the rst of these two events, we want to keep the number of robots suitably small, but to reduce the probability of the second, we want to keep the number of robots suitably large. To make the objective concrete, we will formulate a cost function as a weighted sum of the probabilities of an empty buer and a full buer. We will then study the problem of nding the number of robots that minimizes this cost function. In a more general set-up, the parameters of robot scheduling problems like ours might represent explicit limits not only on buer/storage space, but also on network bandwidth and the obsolescence of stored documents. The rst two constraints combine in certain cases in that losses created by bandwidth limits are modeled by losses created by bounds on buer capacity. If the occupancy of the indexing buer is frequently too large, then during many of the waiting times, buered documents are modied at their home sites, thus making them obsolete before they are even made a part of the data base. It is thus useful to bound waiting times by requiring (e.g.) a small buer. Because of the obsolescence issue, we would have to bound the waiting time either deterministically or stochastically, even when the buer size is eectively innite. The objective function would be slightly modied. Such variants are certainly worthwhile analyzing, and we propose them here as directions for further research. In this paper, the focus is on the simpler nite-buer model, which is still rich enough to serve as a useful reference model for studying the balance between network congestion and server utilization. 2

Robot 1

Site 1

Site 2

Database where information collected by the robots

Site 3

Robot 2

Indexing engine is stored

Site 4

Source 1

Source 2

Server with a finite buffer

Figure 1: Model of search engine with two robots There is a large literature on search engines and their components. The search engines themselves may well be their own best source of references; we recommend this entree to the research on any aspect of the subject. In particular, much can be found on the design and control (including distributed control) of robots. However, we have found very little on the modeling and analysis of robot scheduling and the indexing queue. The work in [2] is the only such eort we know about. In [2], the authors propose a natural model of Web-page obsolescence, and study the problem of scheduling a single search engine robot so as to minimize the extent to which the search engine's data base is out-of-date. Section 2 introduces the probability model, sets notation, and formalizes the optimization problem. Section 3 solves the optimization problem for exponentially distributed service times and presents an explicit computation of the optimal number of robots. The sensitivity of the results to various model parameters is also addressed. Extensions to more general service time distributions are given in Section 4. Section 5 concludes with a brief discussion of ongoing research and open issues.

2 The model The search engine is modeled as a single server nite capacity queue. The system capacity is K 2 (including the position in the server). There are N 1 robots: each robot brings new pages to the queue according to a Poisson process with rate > 0. These N Poisson processes are assumed to be mutually independent and independent of the service times. Hence, new pages are generated according to a Poisson process with intensity N . Although the Poisson assumption is mainly made 3

for sake of mathematical tractability, we do believe that it is a reasonable assumption given the high variability of network latencies. An incoming page nding a full queue is lost. We denote by F (x) = P f xg the probability distribution of the service times (with a generic service time) and by > 0 the mean service time. Dene = 1= . In this notation, we dene the cost function as the weighted sum of two terms:

the probability of an empty buer P fX = 0g, where X is a random variable representing

the stationary queue-length in a M/G/1/K queue with arrival rate N and service time distribution F ;

the probability of losing an arriving page, that is, the probability that the queue length seen by an arrival in this M/G/1/K is K , which we denote by P fX = K g. With := N= > 0, the cost function is then dened as

C (; ; K ) := P fX = 0g + P fX = K g

(1)

where is a positive constant. In the remainder of this paper we will resort to queueing theory to compute C (; ; K ) and to optimize this quantity as a function of , or equivalently of N , the number of robots. We rst cover the situation where the service times are exponentially distributed. A word on the notation in use: throughput f (K ) K !1 g(K ) will stand for limK "1 f (K )=g(K ).

3 The M/M/1/K search engine model This section denes our reference model as the well known M/M/1/K queue. We rst address the problem of nding the optimal number of sources (robots) in this model.

3.1 Optimizing the number of robots The following proposition gives the well known stationary queue-length probabilities at arbitrary epochs for this queue [3]:

Proposition 1 For any > 0

8 > >
:

0

P (X = i) = >

for i > K:

When = 1 the non-zero stationary queue-length probabilities at arbitrary epochs are all equal and given by Prob (X = i) = 1=(K + 1) for i = 0; 1; : : : ; K .

4

The expression for the cost function C (; ; K ) now ows from Proposition 1 and the PASTA property, which ensures that in the M/M/1/K queue the stationary queue-length probabilities at arbitrary epochs and the stationary queue-length probabilities at arrival epochs are equal (i.e., P (X = k) = P (X = k)). We nd that 8 > > >
> > :

(1 ? )( + K ) for 6= 1 1 ? K +1

+1 for = 1. 1+K

(2)

Note that for any K 2 and > 0, the mapping ! C (; K; ) is continuous and dierentiable in (0; 1), including the point = 1.

Lemma 1 For any > 0, K 2, the mapping ! C (; ; K ) has a unique minimum in [0; 1),

to be denoted ( ; K ). Furthermore, 0 < ( ; K ) < 1 if < 1, (1; K ) = 1 and ( ; K ) > 1 if

> 1.

Proof. Fix K 2. We have

@C (; ; K ) = R(; ; K ) @ (1 ? K +1)2

with

R(; ; K ) = 2K ? K K +1 + ( ? 1)(K + 1)K + KK ?1 ? : Tedius but elementary algebra show that the polynomial R(; ; K ) in the variable

(3) (4)

(i) has a zero of multiplicity two (respectively, three) at point = 1 when 6= 1 (respectively,

= 1); (ii) has a zero of multiplicity one in [0; 1) and no zero in (1; 1) when < 1; (iii) has a zero of multiplicity one in (1; 1) and no zero in [0; 1) when > 1; (iv) has no zero other than 1 when = 1. We deduce from the above that

@C (; ; K ) = (1 ? )2 Q(; ; K ) @ (1 ? K +1 )2 where Q(; ; K ) is a polynomial in the variable with a single zero ( ; K ) in [0; 1) with ( ; K ) < 1 if < 1, (1; K ) = 1 and ( ; K ) > 1 if > 1. Furthermore, the inequality Q(0; ; K ) = ? < 0 implies that, for any > 0, we have Q(; ; K ) < 0 for 0 < ( ; K ) and Q(; ; K ) > 0 for > ( ; K ), which proves the lemma. 5

It is worth observing that the optimum ( ; K ) does not depend on the buer size K when = 1. This means that if the same weight is given to the probability of starvation and to the loss probability, then the optimal arrival rate is equal to the service rate, independent of the buer size. We now return to the original problem, namely the computation of the number N of robots that minimizes the cost function C (; ; K ) with = N=. The answer is found in the next result which is a direct corollary of Lemma 1.

Proposition 2 For any > 0, K 2, let N ( ; K ) be the optimal number of robots to use. Then, N ( ; K ) = arg minn C (n=; ; K ) (5) with n 2 fb( ; K )=c; d( ; K )=eg, where for any real number x, bxc (respectively dxe) denotes the largest (respectively smallest) integer less (respectively greater) than or equal to x. In the next section, we investigate how the parameter inuences the optimal number of robots.

3.2 Impact of on the optimal number of robots Recall that the parameter is a positive constant that allows us to stress either the loss probability or the probability of starvation. Part of the impact of on ( ; K ), and therefore on N ( ; K ), the optimal number of robots, is captured in the following result.

Proposition 3 For any K 2, the mapping ! ( ; K ) is nondecreasing in [0; 1), with lim !1 ( ; K ) = 1. Proof. Pick two constants 0 < < and dene 1

2

(; 1 ; 2 ; K ) := C (; 2 ; K ) ? C (; 1 ; K ) = 1 ?1 ?K+1 ( 2 ? 1 ): We assume that ( 2 ; K ) < ( 1 ; K ) and show that this yields a contradiction. Under the condition 1 < 2 the mapping ! (; 1 ; 2 ; K ) is strictly decreasing in [0; 1). Therefore,

0 < (( 2 ; K ); 1 ; 2 ; K ) ? (( 1 ; K ); 1 ; 2 ; K ) = [C (( 2 ; K ); 2 ; K ) ? C (( 1 ; K ); 2 ; K )] + [C (( 1 ; K ); 1 ; K ) ? C (( 2 ; K ); 1 ; K )] 0 (6) where the last inequality follows from the denition of ( ; K ). Since ! (; 1 ; 2 ; K ) is strictly decreasing on [0; 1) we deduce from (6) that ( 2 ; K ) ( 1 ; K ) must hold, which contradicts the 6

11

10

9

8

7

6

5

4

3

K=5 K = 20

2 0

1

2

3

4

5

6

Figure 2: The mapping ! N ( ; K ) (= = 5) assumption that ( 2 ; K ) < ( 1 ; K ). Therefore ( 2 ; K ) ( 1 ; K ) and the mapping ! ( ; K ) is nondecreasing in [0; 1). On the other hand, one can check that @C (; ; K )=@ 0 for = ( =K )1=(K ?1) when > 0. Hence, by Lemma 1, we conclude that 1=(K ?1) 0 ( ; K ) := K ( ; K ); 8 > 0:

(7)

Letting tend to innity on both sides of (7) yields the second result of the proposition. Proposition 3 has a simple physical interpretation. As the parameter increases the probability of starvation becomes the main quantity to minimize. The minimization is done by increasing the arrival rate or, equivalently, by increasing the number of robots. Figure 2 provides two numerical examples, illustrating the monotonicity of the optimal number of robots. The next section focuses on the impact of the buer size K on the optimal number of robots.

7

3.3 Impact of K on the optimal number of robots In this section, we examine the behavior of ( ; K ) as a function of K . The rst result establishes an upper bound on ( ; K ) that complements the lower bound given in (7).

Lemma 2 For any 1, K 2 ( ; K ) < 1( ; K ) := ((K + 1) )1=(K ?1) :

(8)

Proof. Thanks to Lemma 1, it is enough to show that @C (; ; K )=@ > 0 at the point = ( ; K ) 1

or, equivalently from (3), that R(1 ( ; K ); ; K ) > 0. By writing R(; ; K ) in the form ?

R(; ; K ) = K +1 K ?1 ? K + ( ? 1)(K + 1)K + KK ?1 ? ; we nd that K+1

K

R(1 ( ; K ); ; K ) = ((K + 1) )) K?1 + ( ? 1)(K + 1) ((K + 1) ) K?1 + (K (K + 1) ? 1) which is strictly positive, in particular for K 2 and 1. Using Lemma 1 together with the lower and upper bounds on ( ; K ) reported in (7) and (8), respectively, we get that 0 ( ; K ) ( ; K ) < 1; for 0 < < 1 (9) and 1 ( ; K ) ( ; K ); for > 1: (10) By combining (9) and (10) with the limits limK "1 0 ( ; K ) = limK "1 1 ( ; K ) = 1 ( > 0) and the identity (1; K ) = 1 (see Lemma 1), we conclude that

lim ( ; K ) = 1

(11)

K !1

for any > 0. In other words, we have shown that the optimal arrival rate converges to the service capacity when the buer size increases to innity. The limiting result (11) can be used to nd an approximation for the optimal number of robots to be deployed when K is large. Indeed, the relation

lim N ( ; K ) = Klim arg !1

K !1

min

n2fb=c;d=eg

C ( n=; ; K );

(12)

which follows from (5) and (11), suggests the following approximation, for large K : 8
0 when > 1 (see Proposition 1). It is not an easy task to study the behavior of ( ; K ) as a function of K . We suspect the mapping K ! ( ; K ) to be increasing when 0 < < 1 and decreasing when > 1, but we have not been able to prove it. The conjectured behavior of the mapping K ! ( ; K ) (resp. K ! N ( ; K )) is illustrated in Figure 3 (resp. Figure 4). Figure 5 displays the behavior of the optimal number of robots as a function of the ratio = when the buer size is innite. In both curves the parameter

is held xed and taken equal to 0:5 and 2, respectively. We see from Figure 5 that the optimal number of robots N ( ; 1) is equal to 6 whenever = = 6 and the buer size is innite, for 2 f0:5; 2g. Figure 4(b) tells us that for K 13 (resp. K 18) N ( ; 1) gives the correct value for N ( ; K ) when = 0:5 (resp. = 2), which seems to indicate that the accuracy of the approximation (14) may be very sensitive to the model parameters. 1

2.8 2.6

0.9

2.4 0.8 2.2 0.7

2

0.6

1.8 1.6

0.5 1.4 0.4

1.2

0.3

1 2

4

6

8

10

12

14

16

18

20

2

4

6

(a) = 0:5

8

10

12

(b) = 2 Figure 3: The mapping K ! ( ; K )

9

14

16

18

20

16

16

= = 20::05

14

12

= = 20::05

14

12

10

10

8

8

6

6

4

4

2

2 2

4

6

8

10

12

14

16

18

20

2

4

(a) = = 5:5

6

8

10

12

14

16

18

20

(b) = = 6

Figure 4: The mapping K ! N ( ; K )

7

6

Opt(0.5,x) Opt(2.0,x)

5

4

3

2

1

0 0

1

2

3

4

Figure 5: The mapping = ! N ( ; 1)

10

5

6

7

4 The M/G/1/K queue search engine model The queueing model in Section 3 is again considered in this section but we now relax the assumption that the service times are exponentially distributed, i.e. we model the search engine as an M/G/1/K queue with service time distribution F . Explicit results are harder to come by.

4.1 Preliminaries We rst introduce additional notation and denitions. For 0 a.s. (service times have no mass at zero) and that the LST of the service times is a rational function. More precisely, we consider the situation where

F () = BA(()) P

P

with A() := qi=0 ai i and B () := ri=0 bi i are relatively prime polynomials (they have no common roots). Since A() and B () are relatively prime polynomials the coecients a0 and b0 cannot both be equal to 0 (otherwise A() and B () would have the common factor ); this in turn implies that b0 6= 0 since 1 = F (0) 6= 0. Lastly, we observe that q > r under the condition lim!+1 F () = 0, which follows from the assumption that > 0 a.s. Recall the denition of G (!) (see Lemma 3). The above setting implies that

Q(!; )

1

G(!) = R(!; )

(22)

with

Q(!; ) := A((1 ? !)= ) =

q X i=0

ai() !i

R(!; ) := B ((1 ? !)= ) ? ! A((1 ? !)= ) = 12

(23) q+1 X i=0

bi() !i

(24)

where

ai () := (?1)

i

q X j aj j =i

j i j ; for 0 i q

(25)

and

b0 () :=

r X j =0

bj j j " r X j aj

q X

#

bj j ; for 1 i r j bi () := (?1)i + j i i ? 1 j j =i j =i?1 q X aj j ; for r + 1 i q + 1: j i?1 bi () := (?1) i ? 1 j j =i?1 j

(26)

Note that fai (); bi+1 (); 0 i qg are all polynomials of degree q and that b0 () is a polynomial of degree r. The next result provides an ecient scheme of convolution type for computing K () and subsequently the cost C (; ; K ) in (21).

Lemma 4 For K 2, 0, (i) b0 () 6= 0; (ii) K () can be computed by means of the following recursion: 8 > > > > > > > > > > > >
> > > > > > > > > > > :

a0 () b0()

for K = 2

?2 b () aK ?2 () ? KX i K ?i() for 3 K q + 2 b0 () b 0 () i=1

?

q+1 X bi ()

K ?i() b 0 () i=1

(27)

for K q + 3:

Proof. From (26) and the denition of the polynomial B () we see that b () = B (= ). Therefore, b () = 0 would imply that F (= ) = 0 since A( ) = 6 0 (as A and B have no common roots), which would yield a contradiction, since F (= ) = E[exp(?= )] > 0. Consequently, b () = 6 0 for all 0 and K 2. The proof of (ii) is an easy induction on K using K () = (1=(K ? 2)!) lim!# dK ? G (!)? =d!K ? . 0

0

0

0

It is omitted for the sake of conciseness.

13

2

1

2

The complexity of computing fK (); K 2g can be reduced even further. Indeed, we observe from (27) that for K q + 3, K () satises a linear relation of order q + 1, namely,

K () = c1K ?1 () + + cq+1 K ?q?1()

(28)

for K q + 3, with cn := ?bn ()=b0 (). We can then invoke the theory of linear relations [5, Theorem 2.2, p. 48] to conclude from the above that

K () =

q+1 X i=1

Ki ()

i X j =1

K j?1 di;j (); K 2;

(29)

where fi (); 1 i mg, 1 m q + 1, are the distinct roots of the (characteristic) polynomial

P (x) = xq+1 ? c1 xq ? c2 xq?1 ? ? cq+1 ; with i the multiplicity of i . The unknown coecients fdi;j (); 1 j i ; 1 i mg in (29) are computed from the initial conditions on 2 (); : : : ; q+2 () that can themselves be obtained from (27).

Remark 2 Since Pqi bi() = 0 by denition of R(!; ) (take ! = 1 in (24)), the linear relation +1 =0

of order q + 1 in (28) can be reduced to a linear relation of order q. This procedure is illustrated in Section 4.4.

4.3 Asymptotic behavior of the optimal number of robots Unlike in the M/M/1/K case, an explicit computation for the optimal number of robots to deploy is out of reach. Even the task of showing the uniqueness of this optimum is non-trivial. In this section, we will content ourselves with the derivation of basic asymptotics. We rst show the existence of a nite optimum.

Lemma 5 For any > 0, K 2, there exist nite real numbers 0 ( ; K ) < ( ; K ) < < n( ; K ), 1 n < 1, that minimize the cost C (; ; K ) over [0; 1). 1

2

Proof. It is easily shown by an induction on K using (27) that K () is a rational function in the

variable , namely, K () = f ()=g() with f () and g() polynomials of degree qK ?1 and rK ?1, respectively (g() = b0 ()K ?1 ). Since q > r this implies that K () has a limit which is innite as ! 1; since K () 1 for all 0 and K 2 as pointed out in Remark 1, we deduce that, necessarily, lim K () = +1; (30) !1 which together with (21) implies that

lim C (; ; K ) = 1:

!1

14

On the other hand, (30) also implies that there exists 0 0 such that K (0 ) > , which in turn implies that C (0 ; ; K ) < 1 from (21). This shows that the (continuous) mapping ! C (; ; K ) reaches is minimum in [0; 1). The number of points in [0; 1) where C (0 ; ; K ) its minimum is nite as a consequence of the fact that ! C (; ; K ) is a rational function because K () is. This concludes the proof. The next result shows that the optimal number of robots increases to innity as the coecient increases to innity.

Proposition 4 For K 2,

lim 1 ( ; K ) = +1:

!1

Proof. Throughout the proof, K 2 is held xed. We know (see the proof of Lemma 5) that K () = f ()=b ()K ? , with f a polynomial of degree qK ? . Since b () = 6 0 for 0 (see Lemma 4), we deduce that the mapping ! K () is continuous on [0; 1), which in turn implies from (21) and the condition K () 1 for all 0 (see Remark 1) that, for any > 0, the mapping ! C (; ; K ) is also continuous on [0; 1). Assume that lim inf !1 ( ; K ) = L < 1. Then there exists a sequence f n gn with limn!1 n = +1 such that limn!1 ( n ; K ) = L. With the continuity of the mappings ! K () and ! C (; ; K ) on [0; 1); this implies that (see (21)) lim C ( ( n ; K ); n ; K ) = +1: n!1 0

1

1

0

1

1

1

To complete the proof we need to show that lim !1 C (1 ( ; K ); ; K ) < 1 when lim !1 1 ( ; K ) = +1. We see from (30) that, for any > 0, one can nd = h( ; K ) such that

P fX = 0g = 1 + 1 () < 1 K By denition (2) of the cost C (; ; K ) this implies that C ((h( ; K ); ; K ) 2 for any > 0, thereby showing that lim !1 ( ; K ) = +1 must hold. 1

We now turn our attention to the analysis of the behavior of the optimal number of robots as the buer size increases to innity. Lemma 6 below gives the asymptotics of K () as K gets large. From this result we will deduce the asymptotic behavior of C (; ; K ) (Proposition 5) and then the optimal number of robots (Proposition 6) as K goes to innity.

Lemma 6 For 0,

()?1

K () K"1 D(!)K()K 0

15

where D() is given by 8 > > > > > > > > > >
2 RQ(2)(1(1; 1) ; 1) > > > > > > > > > :

for = 1

(31)

? ! R() Q(!(!((); );)) for > 1 0

0

(1)

0

with R(i) (!; ) := @ i R(!; )=@!i for i = 1; 2.

Lemma 6 follows directly from the denition of K (), the coecient of !K ?2 in the Taylor series expansion of 1=G (!) (K 2), the denition of !0 () and () (see the beginning of Section 4.1), and [5, Theorem 4.1, p. 159].

Proposition 5 For any > 0 K "1

C (; ; K )

8 > > < > > :

(1 ? ) for 0 < 1 0 for = 1 ? 1 for > 1.

Proof. Denition (21) of C (; ; K ) and Lemma 6 already imply that 8 > > >
0, the cost C (; ; K ) is minimized at = 1 when K ! 1.

In direct analogy with the M/M/1/K case (see (13)) Proposition 6 suggests the following approximation for the optimal number of robots N ( ; K ) when K is large: 8
> ? 1 (1 ? ) . > : b=c if 8 > > >
0, with f (!; ) := (1 ? !)= . 1

2

1

2

1

1

2

2

The coecients of the polynomials Q and R in (22) are easily identied; we nd that 17

a () = (= ) + ( + )= + 2

0

1

2

1

2

a () = ?2(= ) ? ( + )= 2

1

a () = (= ) 2

1

2

2

and

b () = (1 ? p) = + 0

1

1

2

b () = ?(= ) ? ((2 ? p) + ) = ? 1

2

1

2

1

2

b () = 2(= ) + ( + )= b () = ?(= ) . 2

3

2

1

2

2

Applying Lemma 4 we see that the unknown quantity K () in (21) is given by

2 () = ab 0(()) 0 3 () = ab1(()) ? a0b(()b)12() 0 0 2 4 () = ab2(()) ? a1b(()b)12() + a0 (b)(b1)(3) ? a0b(()b)22() 0 0 0 0

(32) (33) (34)

and

K () = ? bb1 (()) K ?1 () ? bb2(()) K ?2 () ? bb3 (()) K ?3 () for K 5: 0 0 0

(35)

P

As pointed out in Remark 2, the identity 3i=0 bi () = 0 may be used to reduce the order of the linear relation (35). Dene uK () := K () ? K ?1() (36) for K 3. Then (35) becomes b 1 () uK () = ? 1 + b () uK ?1() + bb3 (()) uK ?2 () for K 5: 0 0

The solution to this linear relation of order 2 is given by [5, Theorem 2.2, p. 48]

uK () = 1 ()K d1 () + 2 ()K d2 () for K 3

(37)

with 1 () and 2 () the (distinct) roots of the polynomial b0 ()x2 +(b0 ()+ b1 ())x ? b3 (), namely, p

i () = ?(b () +2bb(())) () ; i = 1; 2; 0

1

0

18

with () := (b0 () + b1 ())2 + 4 b0 ()b3 (). (The proof that () > 0 for all > 0 is left to the reader.) The coecients di () in (37) are computed from the initial conditions u2 () and u3 (). We nd

d1 () = 2 () (3(()?)3(2(()))??(4(())) ? 3 ()) 1

2

1

d2 () = 4 () ? 3(())3 ?((1() )?(3 (()))? 2 ()) 2

2

1

where i () (i = 2; 3; 4) are given in (32)-(34). Combining (36) and (37), we nally get

K () = 2 () + d1 ()

K X i=2

1 ()i + d2 ()

K X i=2

2 ()i ; K 2:

Figure 6 represents what we have found to be typical behavior of the cost function C (; ; K ) as a function of . One can observe that the minimum is unique and obtained, say, at = ( ; K ). We have computed ( ; K ) for various values of the parameters , K and the probability p that a page has to be updated. Figure 7 displays the mapping K ! ( ; K ) for two values of the probability p (p = 1 and p = 0:5) and = 1; Figure 8 displays the mapping K ! ( ; K ) for two values of ( = 0:5 and = 2) and p = 1. As in the M/M/1/K case, we observe in Figure 8 that ( ; K ) is increasing when < 1 and decreasing when > 1.

5 Concluding remarks Simple queueing models (the M/M/1/K and M/G/1/K queues) of search engines have been proposed, analyzed, and optimized in order to nd the optimal number of robots to use. The cost function is a weighted sum of the loss probability and the starvation probability. Extensions of these models to dynamic models where the number of active robots may change over time as a function of the workload in the queue have been proposed in a companion paper [4]. Several interesting, open issues remain, including the situation where the robots are not homogeneous and/or are allocated to dierent parts of the network. For instance, one may wish to determine the optimal number of robots to be allocated to a given area.

References [1] Cohen, J. W ., The Single Server Queue, North-Holland Publishing Company, 1982. [2] Coman Jr., E. G, Liu, Z. and Weber, R. R., Optimal robot scheduling for web search engines, J. Scheduling, 1, pp. 14-22, 1998. 19

1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.4

0.6

0.8

1

1.2

1.4

1.6

Figure 6: The mapping ! C (; ; K ) with 1 = 0:5, 2 = 0:1, p = 1, = 2 and K = 15

20

1.05 1.045 1.04 1.035 1.03 1.025 1.02 1.015 1.01 1.005 1 0

10

20

30

40

0

10

20

30

40

50

60

70

80

90

100

50

60

70

80

90

100

p=1

1

0.995

0.99

0.985

0.98

0.975

0.97

0.965

0.96

0.955

p = 0:5

Figure 7: The mapping K ! ( ; K ) with = 1 21

1

0.9

0.8

0.7

0.6

0.5

0.4 0

10

20

30

40

0

10

20

30

40

50

60

70

80

90

100

50

60

70

80

90

100

= 0:5

2 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1

=2

Figure 8: The mapping K ! ( ; K ) with p = 1 22

[3] Kleinrock, L., Queueing Systems, Vol. I, Wiley & Sons, New York, 1975. [4] Talim, J., Liu, Z., Nain, P. and Coman, Jr., E. G., Optimal number of robots for search engines: the dynamic case. Preprint, Oct. 1997. [5] Sedgewick, R. and Flajolet, P., Analysis of algorithms, Addison-Wesley Publishing Company, 1996. [6] Titchmarsh, E. C., The Theory of Functions, Oxford University Press, 2nd Edition, 1939. [7] Wol, R. L., Poisson Arrivals See Time Averages, Opns. Res., 30, pp. 223-231, 1982.

23

Optimizing the Number of Robots for Web Search Engines

Optimizing the Number of Robots for Web Search Engines

Suggest Documents

A Machine Learning Architecture for Optimizing Web Search Engines

Controlling Robots in Web Search Engines - Sophia - Inria

Optimizing Result Prefetching in Web Search Engines ... - CS, Technion

Searching for people on Web search engines

Optimizing for the Number of Tests Generated in Search ... - CiteSeerX

Optimizing multimodal reranking for web image search

The effectiveness of Web search engines for retrieving relevant ...

Security of World Wide Web Search Engines

Security of World Wide Web Search Engines

Optimal Threshold Control by the Robots of Web Search Engines with ...

Editorial: Evaluating Web search engines

Multimedia Chinese Web Search Engines: A Survey

Web Usage Mining in Search Engines

Discovering Geographic Web Services in Search Engines

Multidisciplinary Perspectives on Web Search Engines

Handling temporal information in web search engines

web search engines pdf - Google Drive

Undergraduates' Preference between Web Search engines and

Web searching, search engines and Information Retrieval

Search Engines

The use of facets in Web search engines

Automated Functional Testing of Web Search Engines in the Absence ...

The Retrieval Effectiveness of Web Search Engines - arXiv

Boosting the Performance of Web Search Engines: Caching and ...