Mining Recent Frequent Itemsets in Data Streams with Optimistic Pruning

نویسندگان

  • Kun Li
  • Yongyan Wang
  • Manzoor Elahi
  • Xin Li
  • Hongan Wang
چکیده

A data stream is a massive unbounded sequence of transactions continuously generated at a rapid rate, so how to process the transactions as fast as possible in the limited memory becomes an important problem. Although it has been studied extensively, most of the existing algorithms maintain a lot of infrequent itemsets, which causes huge space usage and inefficient update. In this paper, a new algorithm, called OPFIstream, is proposed to mine all accurate frequent itemsets from sliding window over data streams. The OPFI-stream algorithm maintains a dynamically selected set of itemsets in a prefix-tree based data structure. By using an optimistic pruning strategy, quite a lot of infrequent itemsets can be pruned during the construction and updates. Mining all frequent itemsets with accurate frequencies is just to traverse the tree. Experiments show that the performance is improved greatly even when the user-specified minimum support threshold is small.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Recent Frequent Itemsets in Sliding Windows over Data Streams

This paper considers the problem of mining recent frequent itemsets over data streams. As the data grows without limit at a rapid rate, it is hard to track the new changes of frequent itemsets over data streams. We propose an efficient one-pass algorithm in sliding windows over data streams with an error bound guarantee. This algorithm does not need to refer to obsolete transactions when 316 C....

متن کامل

Incremental updates of closed frequent itemsets over continuous data streams

Online mining of closed frequent itemsets over streaming data is one of the most important issues in mining data streams. In this paper, we propose an efficient one-pass algorithm, NewMoment to maintain the set of closed frequent itemsets in data streams with a transaction-sensitive sliding window. An effective bit-sequence representation of items is used in the proposed algorithm to reduce the...

متن کامل

Accelerating Closed Frequent Itemset Mining by Elimination of Null Transactions

The mining of frequent itemsets is often challenged by the length of the patterns mined and also by the number of transactions considered for the mining process. Another acute challenge that concerns the performance of any association rule mining algorithm is the presence of „null‟ transactions. This work proposes a closed frequent itemset mining algorithm viz., Closed Frequent Itemset Mining a...

متن کامل

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

An Efficient Sliding Window Based Algorithm for Adaptive Frequent Itemset Mining over Data Streams

Mining frequent itemsets over high speed, continuous and infinite data streams is a challenging problem due to changing nature of data and limited memory and processing capacities of computing systems. Sliding window is an interesting model to solve this problem since it does not need the entire history of received transactions and can handle concept change by considering only a limited range o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008