In-Stream Frequent Itemset Mining With Output Proportional Memory Footprint
نویسندگان
چکیده
We propose an online partial counting algorithm based on statistical inference that approximates itemset frequencies from data streams. The space complexity of our algorithm is proportional to the number of frequent itemsets in the stream at any time. Furthermore, the longer an itemset is frequent the closer is the approximation to its frequency, implying that the results become more precise as the stream evolves. We empirically compare our approach in terms of correctness and memory footprint to CARMA and Lossy Counting. Though our algorithm outperforms only CARMA in correctness, it requires much less space than both of these algorithms providing an alternative to Lossy Counting when the memory available is limited.
منابع مشابه
A Review on Algorithms for Mining Frequent Itemset Over Data Stream
Frequent itemset mining over dynamic data is an important problem in the context of data mining. The two main factors of data stream mining algorithm are memory usage and runtime, since they are limited resources. Mining frequent pattern in data streams, like traditional database and many other types of databases, has been studied popularly in data mining research. Many applications like stock ...
متن کاملA Two-Armed Bandit Collective for Examplar Based Mining of Frequent Itemsets with Applications to Intrusion Detection
Over the last decades, frequent itemset mining has become a major area of research, with applications including indexing and similarity search, as well as mining of data streams, web, and software bugs. Although several efficient techniques for generating frequent itemsets with a minimum support (frequency) have been proposed, the number of itemsets produced is in many cases too large for effec...
متن کاملAn Accelerator for Frequent Itemset Mining from Data Streams with Parallel Item Tree
Frequent itemset mining attempts to find frequent subsets in a transaction database. In this era of big data, demand for frequent itemset mining is increasing. Therefore, the combination of fast implementation and low memory consumption, especially for stream data, is needed. In response to this, we optimize an online algorithm, called Skip LC-SS algorithm [1], for hardware. In this paper, we p...
متن کاملCHARM: An Efficient Algorithm for Closed Itemset Mining
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It ...
متن کاملA New Algorithm for High Average-utility Itemset Mining
High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...
متن کامل