Mining Frequent Co-occurrence Patterns across Multiple Data Streams

نویسندگان

  • Ziqiang Yu
  • Xiaohui Yu
  • Yang Liu
  • Wenzhu Li
  • Jian Pei
چکیده

This paper studies the problem of mining frequent co-occurrence patterns across multiple data streams, which has not been addressed by existing works. Co-occurrence pattern in this context refers to the case that the same group of objects appear consecutively in multiple streams over a short time span, signaling tight correlations between these objects. The need for mining such patterns in real-time arises in a variety of applications ranging from crime prevention to location-based services to event discovery in social media. Since the data streams are usually fast, continuous, and unbounded, existing methods on mining frequent patterns requiring more than one pass over the data cannot be directly applied. Therefore, we propose DIMine and CooMine, two algorithms to discover frequent co-occurrence patterns across multiple data streams. DIMine is an Apriori-style algorithm based on an inverted index, while CooMine uses an in-memory data structure called the Seg-tree to compactly index the data that are already seen but have not expired yet. CooMine employs a one-pass algorithm that uses the filterand-refine strategy to obtain the co-occurrence patterns from the Seg-tree as updates to the streams arrive. Extensive experiments on two real datasets demonstrate the superiority of the proposed approaches over a baseline method, and show their respective applicability in different senarios.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study on Distributed Frequent Co-occurrence Patterns Algorithms across Multiple Data Streams

With the era of big data coming, the data streams are fast, continuous, and unbounded. The real-time requirements of the data streams processing results are very high. A large number of researches have been on Frequent Co-occurrence Patterns across multiple data streams. But those algorithms are centralized, which is worked on a single compute node. The memory of a single compute node and CPU c...

متن کامل

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Incremental Mining of Across-streams Sequential Patterns in Multiple Data Streams

Sequential pattern mining is the mining of data sequences for frequent sequential patterns with time sequence, which has a wide application. Data streams are streams of data that arrive at high speed. Due to the limitation of memory capacity and the need of real-time mining, the results of mining need to be updated in real time. Multiple data streams are the simultaneous arrival of a plurality ...

متن کامل

Mining Sequential Patterns Across Data Streams

There are extensive endeavors toward mining frequent items or itemsets in a single data stream, but rare efforts have been made to explore sequential patterns among literals in different data streams. In this paper, we define a challenging problem of mining frequent sequential patterns across multiple data streams. We propose an efficient algorithm MILE to manage the mining process. The propose...

متن کامل

An Algorithm Based on Horizontal Bit Vectors for Mining Frequent Patterns in Data Streams

Most algorithms for mining frequent patterns in data streams are based on structures like FP-tree, complex mining method makes time and storage space large compared to the bit vector expression. In this paper, an algorithm based on Horizontal Bit vectors for mining Frequent Patterns in data Streams HB-FPS is proposed. HB-FPS is divided into two phases, in online phase, it uses bit vectors to ho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015