The JXP Method for Robust PageRank Approximation in a Peer-to-Peer Web Search Network

نویسندگان

  • Josiane Xavier Parreira
  • Carlos Castillo
  • Gerhard Weikum
چکیده

Link analysis on Web graphs and social networks form the foundation for authority assessment, search result ranking, and other forms of Web and graph mining. The PageRank (PR) method is the most widely known member of this family. All link analysis methods perform Eigenvector computations on a potentially huge matrix that is derived from the underlying graph, and the large size of the data makes this computation very expensive. Various techniques have been proposed for speeding up these analyses by partitioning the graph into disjoint pieces and distributing the partitions among multiple computers. However, all these methods require a priori knowledge of the entire graph and careful planning of the partitioning. This paper presents the JXP algorithm for computing PR-style authority scores of Web pages that are arbitrarily distributed over many sites of a peer-to-peer (P2P) network. Peers are assumed to compile their own data collections, for example, by performing focused Web crawls according to their interest profiles. This way, the Web graph fragments that reside at different peers may overlap and, a priori, peers do not know the relationships between different fragments. Partially supported by the EU within the 6th Framework Programme under contract 001907 “Dynamically Evolving, Large Scale Information Systems” (DELIS). Josiane Xavier Parreira Max-Planck Institute for Informatics E-mail: [email protected] Carlos Castillo Yahoo! Research E-mail: [email protected] Debora Donato Yahoo! Research E-mail: [email protected] Sebastian Michel Max-Planck Institute for Informatics E-mail: [email protected] Gerhard Weikum Max-Planck Institute for Informatics E-mail: [email protected] The JXP algorithm runs at every peer, and it works by combining locally computed authority scores with information obtained from other peers by means of random meetings among the peers in the network. The computation on the combined input of two peers is based on a Markov-chain state-lumping technique, and can be viewed as an iterative approximation of global authority scores. JXP scales with the number of peers in the network. The computations at each peer are carried out on small graph fragments only, and the storage and memory demands per peer are in the order of the size of the peer’s locally hosted data. It is proven that the JXP scores converge to the true PR scores that one would obtain by a centralized PR computation on the global graph. The paper also discusses the issue of misbehaving peers that attempt to distort the global authority values by providing manipulated data in the peer meetings. An extended version of JXP, coined TrustJXP, provides a variety of countermeasures, based on statistical techniques, for detecting suspicious behavior and combining JXP rankings with reputationbased scores.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

JXP: Global Authority Scores in a P2P Network

This document presents the JXP algorithm for dynamically and collaboratively computing PageRank-style authority scores of Web pages distributed in a P2P network. In the architecture that we pursue, every peer crawls and indexes Web fragments at its discretion, driven by the thematic profile or overlay neighborhood of the peer. The JXP algorithm runs at every peer, and is initialized by a local ...

متن کامل

p2pDating: Real life inspired semantic overlay networks for Web search

We consider a network of autonomous peers forming a logically global but physically distributed search engine, where every peer has its own local collection generated by independently crawling the web. A challenging task in such systems is to efficiently route user queries to peers that can deliver high quality results and be able to rank these returned results, thus satisfying the users’ infor...

متن کامل

Proceedings of the ACM SIGIR 2005 Workshop on Heterogeneous and Distributed

We consider a network of autonomous peers forming a logically global but physically distributed search engine, where every peer has its own local collection generated by independently crawling the web. A challenging task in such systems is to efficiently route user queries to peers that can deliver high quality results and be able to rank these returned results, thus satisfying the users’ infor...

متن کامل

A Novel Caching Strategy in Video-on-Demand (VoD) Peer-to-Peer (P2P) Networks Based on Complex Network Theory

The popularity of video-on-demand (VoD) streaming has grown dramatically over the World Wide Web. Most users in VoD P2P networks have to wait a long time in order to access their requesting videos. Therefore, reducing waiting time to access videos is the main challenge for VoD P2P networks. In this paper, we propose a novel algorithm for caching video based on peers' priority and video's popula...

متن کامل

Knowing Where to Search: Personalized Search Strategies for Peers in P2P Networks

Optimizing and focusing search and results ranking in P2P networks becomes more and more important with the increasing size of these networks. Even though a few approaches have already started to investigate the computation of PageRank-like values in P2P environments, none so far has investigated how personalization could be added to it. This paper tackles the problem of distributedly computing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007