Analyzing Patterns in Large-Scale Graphs Using MapReduce in Hadoop
نویسندگان
چکیده
Analyzing patterns in large-scale graphs, such as social networks (e.g. Facebook, Linkedin, Twitter) has many applications including community identification, blog analysis, intrusion and spamming detections. Currently, it is impossible to process information in large-scale graphs with millions even billions of edges with a single computer. In this paper, we take advantage of MapReduce, a programming model for processing large datasets, to detect important graph patterns using open source Hadoop on Amazon EC2. The aim of this paper is to show how MapReduce cloud computing with the application of graph pattern detection scales on real world data. We implement Cohen’s MapReduce graph algorithms to enumerate patterns including triangles, rectangles, trusses and barycentric clusters using real world data taken from Snap Stanford. In addition, we create a visualization algorithm to visualize the detected graph patterns. The performance of MapReduce graph algorithms has been discussed too.
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملRadius Plots for Mining Tera-byte Scale Graphs: Algorithms, Patterns, and Observations
Given large, multi-million node graphs (e.g., FaceBook, web-crawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers of the graphs? We show that the Radius Plot (pdf of node radii) can answer these questions. However, computing the Radius Plot is prohibitively expensive for graphs reaching the planetary scale. There are two major contribut...
متن کاملA Distributed Implementation of GXPath
In the last few years there has been an increasing number of application fields, like the Semantic Web, social networks, bioinformatics, astronomical databases, etc., where large graph datasets are analyzed, queried, and, more generally, manipulated. Graphs are usually queried by specifying reachability patterns through regular path expressions; this leads to the need for efficient and scalable...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملAn Improved Performance Evaluation on Large-Scale Data using MapReduce Technique
Abstract: In a day-to-day life, the capacity of data increased enormously with time. The growth of data which will be unmanageable in social networking sites like Facebook, Twitter. In the past two years the data flow can increase in zettabyte. To handle big data there are number of applications has been developed. However, analyzing big data is a very challenging task today. Big Data refers to...
متن کامل