Arabesque: A System for Distributed Graph Mining - Extended version

نویسندگان

  • Carlos H. C. Teixeira
  • Alexandre J. Fonseca
  • Marco Serafini
  • Georgos Siganos
  • Mohammed J. Zaki
  • Ashraf Aboulnaga
چکیده

Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some “interestingness” criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics. In this paper, we present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque’s API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions. ∗Currently a PhD student at the Federal University of Minas Gerais, Brazil. This work was done while the author was at QCRI.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Overlapping Communities in Real-world Networks Based on Extended Modularity Gain

Detecting communities plays a vital role in studying group level patterns of a social network and it can be helpful in developing several recommendation systems such as movie recommendation, book recommendation, friend recommendation and so on. Most of the community detection algorithms can detect disjoint communities only, but in the real time scenario, a node can be a member of more than one ...

متن کامل

Generalized Version Space Trees

We introduce generalized version space trees, a novel data structure that serves as a condensed representation in inductive databases for graph mining. Generalized version space trees allow for a comfortable representation of version spaces and a natural way to efficiently process inductive queries and operations on version spaces. In particular, we focus on using generalized version space tree...

متن کامل

An Improved Token-Based and Starvation Free Distributed Mutual Exclusion Algorithm

Distributed mutual exclusion is a fundamental problem of distributed systems that coordinates the access to critical shared resources. It concerns with how the various distributed processes access to the shared resources in a mutually exclusive manner. This paper presents fully distributed improved token based mutual exclusion algorithm for distributed system. In this algorithm, a process which...

متن کامل

Simple linear algorithms for mining graph cores

Batagelj and Zaversnik proposed a linear algorithm for the wellknown k-core decomposition problem. However, when k-cores are desired for a given k, we find that a simple linear algorithm requiring no sorting works for mining k-cores. In addition, this algorithm can be extended to mine (k1, k2, . . . , kp)-cores from p-partite graphs in linear time, and this mining approach can be efficiently im...

متن کامل

A graph search algorithm: Optimal placement of passive harmonic filters in a power system

The harmonic in distribution systems becomes an important problem due to an increase in nonlinear loads. This paper presents a new approach based on a graph algorithm for optimum placement of passive harmonic filters in a multi-bus system, which suffers from harmonic current sources. The objective of this paper is to minimize the network loss, the cost of the filter and the total harmonic disto...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1510.04233  شماره 

صفحات  -

تاریخ انتشار 2015