Parallelization techniques for accelerating PageRank computation

نویسندگان

  • H. Migallón
  • V. Migallón
  • J. Penadés
چکیده

PageRank is a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. Let G = [gij ]i,j=1 be a Web graph adjacency matrix with elements gij = 1 when there is a link from page j to page i, with i 6= j, and zero otherwise. From this matrix we can construct a transition matrix P = [pij ] n i,j=1 as follows: pij = gij cj if cj 6= 0 and 0 otherwise, where cj = ∑n i=1 gij , 1 ≤ j ≤ n, represents the number of out-links. For pages with a nonzero number of out-links the matrix P is column stochastic. In this case, the PageRank vector can be obtained by solving Pπ = π. The Power method is one of the oldest and simplest iterative methods for solving this eigenvector problem. When the matrix P ≥ O is irreducible and stochastic, the Power method converges to the eigenvector corresponding to λmax = 1. However, the Web contains many pages without out-links. In this case, the matrix P is non-stochastic and the Power method can not be used. Moreover, the matrix irreducibility is not satisfied for a Web graph. In order to overcome these difficulties, Page and Brin [2] change the transition matrix P to a column stochastic matrix P̄ = α(P + vd ) + (1− α)ve , where d ∈ < is defined by di = 1 if and only if ci = 0 and the vector v ∈ < is some probability distribution over pages. Then, setting α such that 0 < α < 1, the Power method can be used to solve the stationary distribution of the ergodic Markov chain defined by P̄ π = π.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web-Site-Based Partitioning Techniques for Efficient Parallelization of the PageRank Computation

The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. PageRank computation includes repeated iterative sparse matrix-vector multiplications. Due to the enourmous size of the Web matrix to be multiplied, PageRank computations are usually carried out on parallel systems. Graph and hypergraph par...

متن کامل

A Web-Site-Based Partitioning Technique for Reducing Preprocessing Overhead of Parallel PageRank Computation

A power method formulation, which efficiently handles the problem of dangling pages, is investigated for parallelization of PageRank computation. Hypergraph-partitioning-based sparse matrix partitioning methods can be successfully used for efficient parallelization. However, the preprocessing overhead due to hypergraph partitioning, which must be repeated often due to the evolving nature of the...

متن کامل

Efficient Computation of PageRank

This paper discusses efficient techniques for computing PageRank, a ranking metric for hypertext documents. We show that PageRank can be computed for very large subgraphs of the web (up to hundreds of millions of nodes) on machines with limited main memory. Running-time measurements on various memory configurations are presented for PageRank computation over the 24-million-page Stanford WebBase...

متن کامل

Experiments with PageRank Computation

PageRank algorithm is one of the most commonly used algorithms that determine the global importance of web pages. Due to the size of web graph which contains billions of nodes, computing a PageRank vector is very computational intensive and it may takes any time between months to hours depending on the efficiency of the algorithm. This promoted many researchers to propose techniques to enhance ...

متن کامل

Fast PageRank Computation Via a Sparse Linear System (Extended Abstract)

The research community has devoted an increased attention to reduce the computation time needed by Web ranking algorithms. Many efforts have been devoted to improve PageRank [4, 23], the well known ranking algorithm used by Google. The core of PageRank exploits an iterative weight assignment of ranks to the Web pages, until a fixed point is reached. This fixed point turns out to be the (dominan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017