Continuous Similarity Computation over Streaming Graphs
نویسندگان
چکیده
Large network analysis is a very important topic in data mining. A significant body of work in the area studies the problem of node similarity. One way to express node similarity is to associate with each node the set of 1-hop neighbors and compute the Jaccard similarity between these sets. This information can be used subsequently for more complex operations like link prediction, clustering or dense subgraph discovery. In this work, we study algorithms to monitor the result of a similarity join between nodes continuously, assuming a sliding window accommodating graph edges. Since the arrival of a new edge or the expiration of an existing one may change the similarity between several node pairs, the challenge is to maintain the similarity join result as efficiently as possible. Our theoretical study is validated by a thorough experimental evaluation, based on real-world as well as synthetically generated graphs, demonstrating the superiority of the proposed technique in comparison to baseline approaches.
منابع مشابه
Adaptive Approximation-based Streaming Skylines for Similarity Search Query
Actually, large database is not simply considered as a stream database because of streaming data is not only containing huge data volumes, but distributed, continuous, rapid, time varying. Therefore, the general techniques may not suit for streams exactly. Accuracy responses required of approximated answers is more important in stream processing for the similarity search. Therefore, we perform ...
متن کاملIntractability of min- and max-cut in streaming graphs
We show that the exact computation of a minimum or a maximum cut of a given graph G is out of reach for any one-pass streaming algorithm, that is, for any algorithm that runs over the input stream of G’s edges only once and has a working memory of o(n) bits. This holds even if randomization is allowed.
متن کاملTime Constrained Continuous Subgraph Search over Streaming Graphs
The growing popularity of dynamic applications such as social networks provides a promising way to detect valuable information in real time. Efficient analysis over high-speed data from dynamic applications is of great significance. Data from these dynamic applications can be easily modeled as streaming graph. In this paper, we study the subgraph (isomorphism) search over streaming graph data t...
متن کاملContinuous Spatiotemporal Trajectory Joins
Given the plethora of GPS and location-based services, queries over trajectories have recently received much attention. In this paper we examine trajectory joins over streaming spatiotemporal data. Given a stream of spatiotemporal trajectories created by monitored moving objects, the outcome of a Continuous Spatiotemporal Trajectory Join (CSTJ) query is the set of objects in the stream, which h...
متن کاملREADS: A Random Walk Approach for Efficient and Accurate Dynamic SimRank
Similarity among entities in graphs plays a key role in data analysis and mining. SimRank is a widely used and popular measurement to evaluate the similarity among the vertices. In real-life applications, graphs do not only grow in size, requiring fast and precise SimRank computation for large graphs, but also change and evolve continuously over time, demanding an efficient maintenance process ...
متن کامل