PageRank Approximation for a Subgraph or in a ... - Semantic Scholar

SubgraphRank: PageRank Approximation for a Subgraph or in a Decentralized System Yao Wu supervised by Louiqa Raschid University of Maryland, College Park College Park, Maryland

[email protected] ABSTRACT PageRank, a ranking metric for hypertext web pages, has received increased interests. As the Web has grown in size, computing PageRank scores on the whole web using centralized approaches faces challenges in scalability. Distributed systems like peer-to-peer(P2P) networks are employed to speed up PageRank. In a P2P system, each peer crawls web fragments independently. Hence the web fragment on one peer is incomplete and may overlap with other peers. The challenge is to compute PageRank on each web fragment, reflecting the global web graph. Another interesting case is focused crawler, where only pages in a web fragment are of interest. In this research, we study the following problem: Given a web fragment and the whole web structure, approximate the global PageRank scores on subgraph, without running PageRank on the whole Web. We refine the PageRank paradigm to take into consideration the links connecting external pages. We describe a weight assigning approach to convey information about the global graph. We propose an efficient algorithm called SubgraphRank to compute the PageRank scores of a subgraph and design the experiments to validate the algorithm. In P2P case, we will relax the assumption of the global graph in future work.

1.

INTRODUCTION

The explosion of the amount of information available on the Web has made ranking of web pages an unavoidable component in search engines. Since hyperlinks from one page to another usually implies the “endorsement” or “recommendation”, link analysis plays an important role in determining the importance of web pages. PageRank[7, 22] and HITS[20] are two seminal work in the area. PageRank iteratively computes the score of a page based on the scores of its parent pages. HITS considers each web page has two roles: hub and authority. Hub score estimates the value of its links to other pages, and authority score estimates the importance

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, to post on servers or to redistribute to lists, requires a fee and/or special permission from the publisher, ACM. VLDB ‘07, September 23-28, 2007, Vienna, Austria. Copyright 2007 VLDB Endowment, ACM 978-1-59593-649-3/07/09.

of the page. Both algorithms are expensive because of the number of web pages involved in the computation. In January 2005, the indexable Web for search engines was estimated to be more than 11.5 billion pages[16]. In the light of this dramatic growth of the world-wide web, it is almost infeasible to compute PageRank scores for the whole web in memory. Efficient centralized computation of PageRank algorithm has been studied in [17, 19]. An adaptive method is exploited in [17] where pages already converged are not recomputed in a new iteration. Another extrapolation method is exploited in [19] where the higher terms in ranking vector expansion are suppressed. In these works, however, computation is implemented on a single server. The limitation in these centralized approaches is that the demand for memory becomes excessive, which quickly becomes the bottleneck for the computation. Next we consider decentralized approaches to compute ranking. Recent research efforts in distributed systems have addressed the case where the web graph is partitioned to web sites or domains[3, 8, 18, 25]. The basic idea is to compute the PageRank scores on each partitioned web site, then aggregate the rankings from different sites. In these distributed frameworks, however, the Web is nicely partitioned to disjoint fragments. This made the approaches above not applicable to two important applications, which contain single subgraph and multiple subgraphs respectively. The first intriguing application is focused crawler[9, 12], also called thematic crawler. A focused crawler is interested in collecting the subset of the Web related to a specific topic. The whole web can be considered a global graph and the web fragment retrieved by a focused crawler can be considered a subgraph. In this case, PageRank scores for local pages only on the subgraph are of interest. There are two immediate solutions: PageRank on the subgraph, and PageRank on the global graph limiting to local pages. The PageRank on local graph cannot accurately represent the importance of a page, since edges between local pages and external pages and edges between external pages are overlooked, yet the PageRank on the global graph pays the price for accurately computing PageRank scores for all pages. An in-between solution is desired to consider the global graph structure as well as to circumvent the cost of computing the PageRank on the whole web graph. The second important application is the distributed system. In a distributed or decentralized system, there are multiple peers or servers, each of which stores its own web graph

der the assumption that the global graph structure is known. For each subgraph, we introduce an artificial page, called external node to represent all web pages which are unavailable on the local graph. We also assign weights to all edges to reflect the structure of the global graph. When the algorithm is dealing with overlapping peers, instead of relying on meetings of different peers to exchange local PageRank scores and merge local web graphs like the JXP algorithm, SubgraphRank gathers information about the global graph by adding a synthetic page and weights before we compute the ranking. Our contribution is as follows: Figure 1: The percentage of the indexable web that lies in each search engine’s index ([16])

fragment. A user may ask queries on one peer and ranked query answers that are available locally are presented to the user in the order of importance in the context of the query. An important observation about this decentralized framework is that the web graph fragment on each peer is incomplete, as there can be substantial number of external pages (web pages on remote peers) linked to local pages. This scenario introduces additional complexity. We will motivate this application using peer-to-peer network and metasearcher as follows. In recent years, peer-to-peer(P2P) networks has received great attention. The advent of P2P techniques further boosts web information retrieval by leveraging distributed computing power, storage, and connectivity[15, 24, 26]. In such an architecture, the data and functionality are distributed through all the peers. Each peer is autonomous and can index its own fragment of the Web, so it is possible that the web fragments on different peers overlap with each other. If the ranking is computed on each individual peer, it may lead to inconsistent and inaccurate scores for pages, as local web graphs are incomplete and overlap. The similar situation is presented in meta-searcher as well. A study shows that search engines are more different than people expected[1]. For 500 most popular search terms, Google and Yahoo! shared only 3.8 of their top 10 results on average. Part of the reasons behind this inconsistency is that the search engines fetch the web pages following different crawling algorithms. According to a recent study [16], the major search engines including Google, Yahoo!, MSN, Ask/Teoma fetch different portions of the whole indexable web. Figure 1 shows the percentage of the indexable web fetched by each search engine and their overlaps. It stands to reason that a meta-searcher helps to better aggregate relevant results, which may require ranking computation on multiple subgraphs. The JXP algorithm (Juxtaposed Approximate PageRank) [23] considers a fully decentralized peer-to-peer system where subgraphs on peers have overlaps. However, the JXP algorithm is expensive because the PageRank algorithm needs to be executed multiple times on each peer while the peers exchange information about the local web fragments with each other. It converges to the true PageRank scores. In this paper, we study the following problem: Given a web fragment and the whole web structure, approximate the global PageRank scores on a subgraph, without running PageRank on the whole Web. Our solution is an algorithm, SubgraphRank, that computes PageRank on a subgraph un-

• An intelligent method to assign weights to a web graph fragment, where all external pages are collapsed to one artificial page. • An efficient algorithm to estimate the PageRank scores based on the weighted graph, which converges to a unique ranking vector. The rest of the document is organized as follows. Section 2 discusses related work. Section 3 presents our approach for the problem. Future experiment plan are proposed in Section 5 and Section 6 concludes.

2.

RELATED WORK

In this section, we summarize research on distributed PageRank evaluation and local PageRank.

2.1

PageRank

We refer to some surveys[5, 6] for complete description of related work. PageRank was introduced in [7, 22]. The underlying assumption is that links between pages confer authority. A link from page i to page j is an evidence that i is suggesting that j is important. The importance contributed to page j from i is inversely proportional to the outdegree of i. Let Di be the outdegree of page i. The corresponding random walk on the directed web graph can be expressed by a transition matrix A as follows: 1 if there is an edge from i to j, Di A[i, j] = 0 otherwise. Let R be the PageRank vector over the web pages. Initially R can be an arbitrary vector representing the starting probability of visiting web pages. Let be the probability of web surfers following the hyperlinks and (1 − ) be the probability of surfers making a random jump to a page, where is usually set to be 0.85[22]. The personalization vector P can be used to bias PageRank and prefer certain pages. In standard PageRank, 1 P = [ ]n×1 n The PageRank ranking vector is recursively defined as: R = AT · R + (1 − )P According to the Ergodic Theorem for Markov chains, R converges for the web graph, since generally the web graph is aperiodic and irreducible[19, 21].

2.2

Distributed PageRank without overlaps and local PageRank

To overcome the computation cost involved in centralized PageRank algorithm on large graph, there is past research on PageRank in distributed environment. Different algorithms have been designed to utilize the block structure of the Web[3, 8, 18, 25]. In these papers, the entire web is considered to be cleanly partitioned into disjoint web sites and domains. The strategy is usually first to compute the local ranking for each graph, only considering intra-site links, then to compute the site rankings considering intersite links, and finally aggregate the local scores with rankings of web sites. [3] presents a ranking algebra to deal with ranking at different granularity levels. In [8], a random walk used to determine the importance of web sites is defined by inter-site links, as well as the local PageRank scores for individual pages. [18] defines the random walk for web sites differently, while employing the same information. Then the local ranking scores are used as the start vector for true PageRank. [25] proposes a slightly improved algorithm to compute local PageRank. For each web site, an artificial page is added to represent all out-of-domain pages. The stochastic transition matrix entries are defined by the sum of local PageRank values of all source pages from starting site. While these techniques examined ways to efficiently estimate PageRank scores, they are not applicable to arbitrarily overlapped web fragments, nor a subgraph. There also has been some work on a subgraph. That research estimates local PageRank values[10] by expanding a small subgraph surrounding the node of interest. The estimation is made based on this subgraph, usually for a few dozen to a few hundred nodes. Alternatively, we target for an estimation of PageRank for a subgraph with possibly large sized graphs in this paper.

2.3

PageRank for decentralized system with overlaps (Algorithm JXP)

Fully decentralized PageRank for overlapping web fragments has received little attention. The JXP algorithm[23] is the only work, to our knowledge, that proposes an algorithm addressing a fully decentralized system with overlap. The JXP algorithm assumes that the total number of nodes in the global graph is known. Similar to ServerRank[25], where an artificial page is added to improve local PageRank, JXP added a world node on each peer to represent all external pages. The links between local pages and external pages are replaced by links between local pages and the world node. When preprocessing world node and associated edges, the global link structures needs to be assumed. Figure 2 shows an example of a global graph. Nodes A,B,C, and D are pages on a local peer, and nodes X, Y and Z in the cloud are external pages on remote peers. Figure 3 shows the substitution of all external pages by the world node Λ. Local PageRank is computed on this Λ enriched graph. Peers exchange local PageRank scores when two peers meet – the union of two local graphs are computed along with the world node on each peer. Scores are merged using average value if a page appears on both peers. Using merged scores as initial scores, PageRank is computed on merged graph and updated on local score list. Algorithm JXP(GA , ΛA , LA ) B The input for peer A includes local graph GA , B world node ΛA , and score list LA from GA . 1. Contact a random peer B. 2. Merge graphs GA and GB , world nodes ΛA and ΛB ,

and score lists LA and LB for two peers. 3. Run PageRank on Merged graph and update LA . 4. Repeat until scores converge. As we can see, the JXP algorithm is very expensive since graph merging in step 2 requires substantial computational time. A light-weight merging scheme is described in the paper to add relevant information from other peers to the local world node, yet it is still computation extensive to repeat the process over and over again. In addition, JXP converges to true PageRank under the condition that peers eventually meet sufficient number of times to exchange information.

3.

PROPOSED RESEARCH

We propose an algorithm for more efficient computation of PageRank on a subgraph or individual peer. Note that the JXP algorithm gains global information indirectly through multiple meetings between peers. This inefficiency can be avoided by adjusting the probability of the random walk on the local graph according to the global graph structure. For data with typed links like in database, ObjectRank[4] adjusts the random walk according to authority transfer on database schema graph. In the case of Web graph, there is no schema level. Instead, we present a weighting scheme to the local web fragment, where the weights reflect the authority flow in PageRank computation on the global graph. We define a random walk on the weighted local graph, which converges to a unique stationary distribution of the corresponding Markov Chain. In this way, we utilize the graph structure directly for efficiency. Recall that in [23, 25], the artificial node represents the external world by connecting the artificial node to local nodes based on the global web graph. However, adding unweighted edges between the artificial node and local pages cannot distinguish the cases of one link and multiple links between a local page and external pages. Figure 3 provides an example. Page C which actually has 3 incoming edges from external pages is treated similarly to page D, which has only 1 incoming edge from external pages. In intuition, however, we expect higher importance to be conveyed to page C from external pages. We describe our novel strategy for assigning weights to edges in Section 3.1, which distinguishes such cases and discuss related issues. Table 1 lists the symbols used to define our algorithm. Symbol Di Λ IN (i) OU T Λ (i) IN Λ (i) EXT Gl Gg Gs

Meaning Outdegree of page i External node, the artificial node representing all external pages The set of incoming pages pointing to page i The set of external pages that page i points to The set of external pages that point to page i The set of external pages Web graph fragment on a local peer with n pages The global web graph with N pages External node enriched web graph with n + 1 pages

Table 1: Symbols used by SubgraphRank

3.1

Assigning weights to edges

We assume that the total number of pages N in the global graph Gg is known. Also as the JXP algorithm [23] assumed the computational capability for the world node, we assume

Figure 2: A global graph on both local peer and external peers.

Figure 3: An unweighted graph of the local peer and a node to replace external peers.

the global web graph is maintained on a special server. Under this framework, we preprocess the web graph fragment on one peer Gl and extend it to Gs , the external node enriched graph. While maintaining global link structure can be expensive in space, it is still widely used to save computation for global graph by merging web fragments among peers on the fly. For instance, a global inverted index structure is assumed in [24]. Recall that the authority flow along an edge suggested by the transition matrix in the original PageRank algorithm is tightly related to the inverse of the outdegree of pages. To simulate the random walk in PageRank, we introduce the external node Λ to represent all remote pages, add links between local pages and Λ, as well as the self-loop edge on Λ, and then define weights based on the outdegree of pages in the global graph. For edges between local pages or edges from local pages to Λ, it is natural to define weight based on the inverse of the outdegree of the starting page of the edge. More intriguing cases are weights for edges between from Λ to local pages, and self-loop edge on Λ. In these cases, we define the weight as the average of outdegree inverse of starting page of the edge. We propose the following rules to determine weights between a local page i and the external page Λ: 1. The weight for edge from local page i to local page j is assigned D1i . 2. The edge from local page i to Λ is assigned P weight for 1 . Λ l∈OU T (i) Dl 3. ThePweight for edge from Λ to a local page i is assigned to

1 l∈IN Λ (i) Dl

N −n

i∈EXT

different. The reason behind this asymmetry is that Λ is the representation of the set of all external pages EXT . Therefore, rule 3 and 4 adopt the average of inverse outdegree values. Our weight assigning strategy is affiliated to the the probability transition matrix A for Page Rank on the global graph. Without loss of generality, we consider local pages as first continuous n indexes in A and external pages are indexed from n − 1 to N in A. A is a N × N matrix, where entry Ai,j has the value of the outdegree inverse of page j, indicating the probability of random surfer following edge ij. Below we express entries in adjacency matrix M for weighted graph Gs in terms of entries in transition matrix A for PageRank. The adjacency matrix M is a (n + 1) × (n + 1) matrix:   N X

1 l∈IN Λ (i) Dl

, that is, the sum of outdegree flow between all external pages averaged by number of external pages. N −n

Note that the weight assigning rule 2 for edges from local pages to Λ and rule 3 for edges from Λ to local pages are

···

A1,1

        

A1,n

A1,i

i=n+1

. . .

. . .

An,1

···

N X

Aj,1 j=n+1 N −n

. . . N X

An,n

An,i i=n+1 N N X X Ai,j i=n+1 j=n+1 N −n

N X

···

Aj,n j=n+1 N −n

        

With the global graph example in Figure 2, the weights assigned are shown in Figure 4. We provide some examples of edge weight calculation following these rules. According to rule 1, the weight of edge AB, AC, CB, BD, CD, DA are the outdegree inverse. According to rule 2, since A points to only one external page, the weight for the edge from A to Λ is 1/3, the outdegree inverse for page A. Because of rule 3, the weight of the edge from Λ to page C will be calculated as

.

4. The weight for self-loop edge on Λ is determined by P P

Figure 4: An weighted graph of the local peer and a node to replace external peers.

1 DX

+ D1 + D1 Y Z 3

=

1+1+1 3 2 2

3

4, the self-loop edge weight will be

3.2

2 DX

4 . 9 + D1

=

Y

3

Following rule =

2+1 3 2

3

=

7 . 18

Random Walk on a weighted graph

Edges in a graph usually are associated with weights to indicate the strength of the relationship. Random walk on weighted graph has been considered in different contexts. In [11], undirected weighted graph are dealt with to design on line algorithms. In [14], usage data are considered to

rank paths. Our work follows in this spirit and edge weights are utilized to channel the connectivity knowledge between pages from the global graph’s perspective. Let wij denote the weight of edge from i to j. Instead of defining transition matrix for the Markov Chain using outdegrees, the probability of transition from node i to j is given as below in terms of edge weights: ( wij P if there is an edge from i to j, l∈IN (j) wlj As [i, j] = 0 otherwise. where IN (j) is the set of incoming pages pointing to page j and then l is any page in pointing to page j. That is, the probability of transition is defined by the ratio of the edge weight wij to the sum of weights of edges pointing to j. Compared to the random walk on unweighted graph, where surfers follow outgoing links with equal probability, here the probability of following links is influenced by edge weights as well as the graph structure. Recall that the personalization vector in the original PageRank is defined as a uniform vector P = [ n1 ]n×1 , where each entry in the vector is 1 divided by number of pages n. This implies that jumping to any page is equal probable. Instead, for SubgraphRank we define the personalization vector Ps for SubgraphRank according to the number of pages available on the local peer and the total number of pages on the global web. More specifically, the ith entry of Ps , Ps [i] can be expressed by 1 if page i is in local peer, N Ps [i] = N −n if page i is external node Λ. N This implies a surfer randomly jumps to external pages just like she is aware of the pages on other peers. Our SubgraphRank ranking vector is defined by this recursive formula: Rs = ATs · Rs + (1 − )Ps where follows the traditional value 0.85. SubgraphRank converges to a unique vector. There are two reasons. First, the transition matrix As is a stochastic matrix, as the sum of each column is 1. Second, since we complement the random walk with jumps from dangling pages, the Markov Chain we defined is irreducible and aperiodic. SubgraphRank satisfies the two conditions of being irreducible and aperiodic of the Ergodic Theorem for Markov chains[19].

3.3

The algorithm

Let r(i) be the score of page i. We describe our SubgraphRank algorithm for each peer as follows: Algorithm SubgraphRank(Gl , Gg ) B Initialization 1. Create the external node Λ enriched graph Gs −n . 2. Assign the initial score of Λ, r(Λ) ← NN 3. Assign the initial score of any local page i, r(i) ← 4. Run PageRank on Gs based on As and Ps .

3.4

1 N

.

Discussions

With this preliminary idea about adding weights to the random walk on the local web graphs, we take into account the extent of connectivity of a local page and the external pages, but only pays the price of a local PageRank execution cost. This also leaves room for enhancement and customization on several issues.

A special case of this framework is that the local peer contains the entire web graph. In this case, the external node Λ in Gs is disappeared and the adjacency matrix M breaks down to transition matrix A for global graph. The SubgraphRank is the PageRank. With some special user profile in a focused crawler, it is possible that the external web fragments may not be as relevant as the local web graph. Depending on the characteristics of the crawler and user profile, different weight strategy may be more appropriate. This leads to a customization/personalization problem, which will be studied in future work. Another important issue is the computation of edges weights associated with Λ. We assumed that the global link structure is given in weights assigning. In JXP algorithm, peers communicate with each other during PageRank computation multiple times to explore the external graph.In section 4, we describe an approach where peers exchange information before PageRank computation without the global link structure is given.

4.

REMOVE THE SPECIAL SERVER OF GLOBAL WEB GRAPH FROM P2P NETWORK

In previous sections, a prior knowledge of global web graph is assumed. This assumption is restrictive since real world applications may not have such a special server. In the proposal, we will consider a strategy to incrementally update local web fragment on each peer when peers with each other. Peers maintain adjacency list for the Web fragment and the set of external pages corresponding to the external node Λ. Initially, the external node Λ corresponds to an empty set of external pages. The local peer meets with other peers in the network to update the adjacency list and the exernal page set. The weights are assigned on the local peer until the local peer has complete graph link structure in SubgraphRank.

5.

EVALUATION

Currently, we are in the process of the implementing SubgraphRank. This section designs experiments to validate the SubgraphRank algorithm. We downloaded an open source java crawler Heritrix[2] to retrieve a set of web pages and consider this set of pages as our global web graph. It follows with a breadth-first crawler to collect fragments of the global web graph and then assign the fragments to different peers. We plan to calculate Spearman’s footrule distance[13] between the SubgraphRank order and the true global PageRank order, restricting to local pages on each individual peer. We consider Spearman’s footrule distance for two full rankings σ and π. One will be the SubgraphRank scores on Gs , limiting to local pages. The other will be the PageRank scores on Gg , limiting to local pages. Let σ(i) denote the position of rank of page i and n be the cardinality of either list. σ(i) < π(i) means page i is ranked higher in σ. The distance is defined by F (σ, π) = Σn i=1 |σ(i) − π(i)| The normalized distance will be F (σ, π) divided by the maximum value n2 /2.

6.

CONCLUSIONS

In this paper we outlined the applications of subgraph based PageRank and overlap presented PageRank computation, which has not received much attention. We proposed an algorithm SubgraphRank for this problem, that we expect to be considerably efficient than previous work as it only needs to be executed once on local peer. By assigning weights to the eternal node enriched graph and follows with a random walk, the SubgraphRank converges to a unique stationary distribution. We discussed the possible improvement and customization of the algorithm. The experiments evaluation are left as future work.

7.

REFERENCES

[1] http: // www. jux2. com/ stats. php . [2] http: // crawler. archive. org/ . [3] K. Aberer and J. Wu. A framework for decentralized ranking in web information retrieval. In APWeb, pages 213–226, 2003. [4] A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In VLDB, 2004, 2004. [5] P. Berkhin. A survey on pagerank computing. Internet Mathematics, 2(1):73–120, 2005. [6] A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. ACM Trans. Inter. Tech., 5(1):231–297, 2005. [7] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW7: Proceedings of the seventh international conference on World Wide Web 7, pages 107–117, Amsterdam, The Netherlands, The Netherlands, 1998. Elsevier Science Publishers B. V. [8] A. Z. Broder, R. Lempel, F. Maghoul, and J. Pedersen. Efficient pagerank approximation via graph aggregation. In WWW Alt. ’04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pages 484–485, New York, NY, USA, 2004. ACM Press. [9] S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: a new approach to topic-specific web resource discovery. In WWW ’99: Proceeding of the eighth international conference on World Wide Web, pages 1623–1640, New York, NY, USA, 1999. Elsevier North-Holland, Inc. [10] Y.-Y. Chen, Q. Gan, and T. Suel. Local methods for estimating pagerank values. In CIKM ’04: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 381–389, New York, NY, USA, 2004. ACM Press. [11] D. Coppersmith, P. Doyle, P. Raghavan, and M. Snir. Random walks on weighted graphs and applications to on-line algorithms. J. ACM, 40(3):421–453, 1993. [12] M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused crawling using context graphs. In VLDB ’00: Proceedings of the 26th International Conference on Very Large Data Bases, pages 527–534, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. [13] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW ’01:

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

Proceedings of the 10th international conference on World Wide Web, pages 613–622, New York, NY, USA, 2001. ACM Press. M. Eirinaki and M. Vazirgiannis. Usage-based pagerank for web personalization. In ICDM ’05: Proceedings of the Fifth IEEE International Conference on Data Mining, pages 130–137, Washington, DC, USA, 2005. IEEE Computer Society. A. Gulli and A. Signorini. Building an open source meta-search engine. In WWW ’05: Special interest tracks and posters of the 14th international conference on World Wide Web, pages 1004–1005, New York, NY, USA, 2005. ACM Press. A. Gulli and A. Signorini. The indexable web is more than 11.5 billion pages. In WWW ’05: Special interest tracks and posters of the 14th international conference on World Wide Web, pages 902–903, New York, NY, USA, 2005. ACM Press. S. Kamvar, T. Haveliwala, and G. Golub. Adaptive methods for the computation of pagerank. Technical report, Stanford University, 2003. S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Exploiting the block structure of the web for computing pagerank. Technical report, Stanford University, 2003. S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub. Extrapolation methods for accelerating pagerank computations. In WWW, pages 261–270, 2003. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999. R. Motwani and P. Raghavan. Randomized algorithms. Cambridge University Press, New York, NY, USA, 1995. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998. J. X. Parreira, D. Donato, S. Michel, and G. Weikum. Efficient and decentralized pagerank approximation in a peer-to-peer web search network. In VLDB’2006: Proceedings of the 32nd international conference on Very large data bases, pages 415–426. VLDB Endowment, 2006. T. Suel, C. Mathur, J.-W. Wu, J. Zhang, A. Delis, M. Kharrazi, X. Long, and K. Shanmugasundaram. Odissea: A peer-to-peer architecture for scalable web search and information retrieval. In WebDB, pages 67–72, 2003. Y. Wang and D. J. DeWitt. Computing pagerank in a distributed internet search system. In VLDB, pages 420–431, 2004. D. Zeinalipour-Yazti, V. Kalogeraki, and D. Gunopulos. Information retrieval techniques for peer-to-peer networks. Computing in Science and Engg., 6(4):20–26, 2004.

PageRank Approximation for a Subgraph or in a ... - Semantic Scholar

PageRank Approximation for a Subgraph or in a ... - Semantic Scholar

Suggest Documents

Densest k-Subgraph Approximation on

A Truthful Approximation Mechanism for ... - Semantic Scholar

A Simple Approximation for Modeling ... - Semantic Scholar

Efficient and Decentralized PageRank Approximation in a Peer-to ...

a polynomial approximation scheme for - Semantic Scholar

ApproxRank: Estimating Rank for a Subgraph - Semantic Scholar

A Heuristic for the Subgraph Isomorphism Problem ... - Semantic Scholar

PageRank, HITS and a Unified Framework for Link ... - Semantic Scholar

Arnoldi versus GMRES for computing pageRank: A ... - Semantic Scholar

A Constant Factor Approximation Algorithm for a ... - Semantic Scholar

Authenticated Subgraph Similarity Search in ... - Semantic Scholar

A SADDLEPOINT APPROXIMATION TO THE ... - Semantic Scholar

A Sequential Parametric Convex Approximation ... - Semantic Scholar

A modified harmonic oscillator approximation ... - Semantic Scholar

A SADDLEPOINT APPROXIMATION TO THE ... - Semantic Scholar

A General Approximation-Optimization Approach ... - Semantic Scholar

Maximum 4-degenerate subgraph of a planar ... - Semantic Scholar

A NEW MESSAGE LENGTH APPROXIMATION ... - Semantic Scholar

Discovering author impact: A PageRank perspective - Semantic Scholar

A note on the PageRank algorithm - Semantic Scholar

On local estimations of PageRank: A mean field ... - Semantic Scholar

PageRank for ranking authors in co-citation networks - Semantic Scholar

Approximation for Cooperative Interactions of a ... - Semantic Scholar

A Diffusion Approximation for an Epidemic Model - Semantic Scholar