Finding Connected Components on Map-reduce in Logarithmic Rounds
نویسندگان
چکیده
Given a large graph G = (V,E) with millions of nodes and edges, how do we compute its connected components efficiently? Recent work addresses this problem in map-reduce, where a fundamental trade-off exists between the number of mapreduce rounds and the communication of each round. Denoting d the diameter of the graph, and n the number of nodes in the largest component, all prior techniques for map-reduce either require a linear, Θ(d), number of rounds, or a quadratic, Θ(n|V |+ |E|), communication per round. We propose here two efficient map-reduce algorithms: (i) Hash-Greater-to-Min, which is a randomized algorithm based on PRAM techniques, requiring O(log n) rounds and O(|V |+ |E|) communication per round, and (ii) Hash-to-Min, which is a novel algorithm, provably finishing in O(log n) iterations for path graphs. The proof technique used for Hash-to-Min is novel, but not tight, and it is actually faster than Hash-Greater-toMin in practice. We conjecture that it requires 2 log d rounds and 3(|V | + |E|) communication per round, as demonstrated in our experiments. Using secondary sorting, a standard mapreduce feature, we scale Hash-to-Min to graphs with very large connected components. Our techniques for connected components can be applied to clustering as well. We propose a novel algorithm for agglomerative single linkage clustering in map-reduce. This is the first mapreduce algorithm for clustering in at most O(log n) rounds, where n is the size of the largest cluster. We show the effectiveness of all our algorithms through detailed experiments on large synthetic as well as real-world datasets.
منابع مشابه
MST in O(1) Rounds of the Congested Clique
We present a distributed randomized algorithm finding Minimum Spanning Tree (MST) of a given graph in O(1) rounds, with high probability, in the congested clique model. The input graph in the congested clique model is a graph of n nodes, where each node initially knows only its incident edges. The communication graph is a clique with limited edge bandwidth: each two nodes (not necessarily neigh...
متن کاملA Stabilizing Algorithm for Finding Biconnected Components
In this paper, a self-stabilizing algorithm is presented for finding biconnected components of a connected undirected graph on a distributed or network model of computation. The algorithm is resilient to transient faults, therefore, it does not require initialization. The proposed algorithm is based on stabilizing BFS construction and bridge-finding algorithms. Upon termination of these algorit...
متن کاملAn Efficient Parallel Strategy for Computing K-terminal Reliability and Finding Most Vital Edge in 2-trees and Partial 2-trees
We develop a parallel strategy to compute K-terminal reliability in 2-trees and partial 2-trees. We also solve the problem of finding the most vital edge with respect to Kterminal reliability in partial 2-trees. Our algorithms take O(logn) time withC(m;n) processors on a CRCW PRAM, where C(m;n) is the number of processors required to find connected components of a graph with m edges and n verti...
متن کاملDistributed Approximation Algorithms in Unit-Disk Graphs
We will give distributed approximation schemes for the maximum matching problem and the minimum connected dominating set problem in unit-disk graphs. The algorithms are deterministic, run in a poly-logarithmic number of rounds in the message passing model and the approximation error can be made O(1/ log |G|) where |G| is the order of the graph and k is a positive integer.
متن کاملAutomatic Service Composition Based on Graph Coloring
Web services as independent software components are published on the Internet by service providers and services are then called by users’ request. However, in many cases, no service alone can be found in the service repository that could satisfy the applicant satisfaction. Service composition provides new components by using an interactive model to accelerate the programs. Prior to service comp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1203.5387 شماره
صفحات -
تاریخ انتشار 2011