PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning

نویسندگان

  • Zhi-Hong Deng
  • Sheng-Long Lv
چکیده

N-list is a novel data structure proposed in recent years. It has been proven to be very efficient for mining frequent itemsets. In this paper, we present PrePost + , a high-performance algorithm for mining frequent itemsets. It employs N-list to represent itemsets and directly discovers frequent itemsets using a set-enumeration search tree. Especially, it employs an efficient pruning strategy named Children–Parent Equivalence pruning to greatly reduce the search space. We have conducted extensive experiments to evaluate PrePost + against three state-of-the-art algorithms, which are PrePost, FIN, and FP-growth ⁄ , on six various real datasets. The experimental results show that PrePost + is always the fastest one on all datasets. Moreover, PrePost + also demonstrates good performance in terms of memory consumption since it use only a litter more memory than FP-growth ⁄ and less memory than PrePost and FIN. Frequent itemset mining, first proposed by Agrawal, Imielinski, and Swami (1993), has become a popular data mining technique and plays an fundamental role in many important data mining tasks such as mining associations, correlations, episodes, and etc. Since the first proposal of this new data mining task, there have been hundreds of follow-up research publications (Han, Cheng, Xin, & Yan, 2007). Although lots of algorithms have been proposed, how to design efficient mining methods is still one of several key research problems yet to be solved In recent years, we present an algorithm called PrePost (Deng et al., 2012) for mining frequent itemsets. The high efficiency of PrePost is achieved by: (1) employing a novel data structure named N-list to represent itemsets; and (2) adopting single path property of N-list to directly discovery frequent itemsets without generating candidate itemsets in some cases. The experiments in Deng et al. (2012) show that PrePost runs fast than some state-of-the-art mining algorithms including FP-growth and FP-growth ⁄. Although PrePost adopts single path property of N-list to prune the search space, it still incurs the problem of too many candidates because it employs Apriori-like approach for mining frequent item-sets. In this paper, we propose a new algorithm called PrePost + , which can effectively avoid the above problem. PrePost + employs N-list to represent itemsets and directly discovers frequent item-sets in an itemset & N-list search tree. For avoiding repetitive search, it also adopts Children–Parent Equivalence pruning to greatly reduce the search space. For evaluating the performance of PrePost + , we conduct …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast mining frequent itemsets using Nodesets

Node-list and N-list, two novel data structure proposed in recent years, have been proven to be very efficient for mining frequent itemsets. The main problem of these structures is that they both need to encode each node of a PPC-tree with pre-order and post-order code. This causes that they are memory consuming and inconvenient to mine frequent itemsets. In this paper, we propose Nodeset, a mo...

متن کامل

A novel approach for fast mining frequent itemsets use N-list structure based on MapReduce

Frequent Pattern Mining is a one field of the most significant topics in data mining. In recent years, many algorithms have been proposed for mining frequent itemsets. A new algorithm has been presented for mining frequent itemsets based on N-list data structure called Prepost algorithm. The Prepost algorithm is enhanced by implementing compact PPC-tree with the general tree. Prepost algorithm ...

متن کامل

AIM2: Improved implementation of AIM

We present AIM2-F , an improved implementation of AIM-F [4] algorithm for mining frequent itemsets. Past studies have proposed various algorithms and techniques for improving the efficiency of the mining task. We have presented AIM-F at FIMI’03, a combination of some techniques into an algorithm which utilize those techniques dynamically according to the input dataset. The algorithm main featur...

متن کامل

AIM: Another Itemset Miner

We present a new algorithm for mining frequent itemsets. Past studies have proposed various algorithms and techniques for improving the efficiency of the mining task. We integrate a combination of these techniques into an algorithm which utilize those techniques dynamically according to the input dataset. The algorithm main features include depth first search with vertical compressed database, ...

متن کامل

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2015