The Algorithm of Mining Frequent Closed Itemsets Based on Index Array

نویسندگان

  • Haitao He
  • Shasha Feng
  • Jiadong Ren
  • Qian Wang
چکیده

The set of frequent closed itemsets determines exactly the complete set of all frequent itemsets and is usually much smaller than the latter. In this paper, an algorithm based on index array for mining frequent closed itemsets, Index-FCI is proposed. The vertical BitTable is adopted to compress the dataset for counting fast the support. To make use of the horizontal BitTable, the index array corresponding to the database is constructed and a new concept GFI (Great Frequent Itemset) is defined, which can be quickly found from the index array to reduce the closed itemsets checking. The hash table whose hash function value is the support of the itemset is created to remove any frequent but “non-closed” itemsets using the hash pruning. In Index-FCI, the database is firstly compressed into BitTable; secondly, the index array corresponding to the dataset is constructed; thirdly, GFI is found from the index array and the hash table is created to store frequent closed itemsets; finally, the hash table is traversed to obtain all frequent closed itemsets. Experimental results show that Index-FCI is suitable for mining frequent closed itemsets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Index-CloseMiner: An improved algorithm for mining frequent closed itemset

The set of frequent closed itemsets determines exactly the complete set of all frequent itemsets and is usually much smaller than the latter. This paper proposes an improved algorithm for mining frequent closed itemsets. Firstly, the index array is proposed, which is used for discovering those items that always appear together. Then, by using bitmap, an algorithm for computing index array is pr...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

An Efficient Mining Algorithm by Bit Vector Table for Frequent Closed Itemsets

Mining frequent closed itemsets in data streams is an important task in stream data mining. In this paper, an efficient mining algorithm (denoted as EMAFCI) for frequent closed itemsets in data stream is proposed. The algorithm is based on the sliding window model, and uses a Bit Vector Table (denoted as BVTable) where the transactions and itemsets are recorded by the column and row vectors res...

متن کامل

DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets

Frequent closed itemsets (FCI) play an important role in pruning redundant rules fast. Therefore, a lot of algorithms for mining FCI have been developed. Algorithms based on vertical data formats have some advantages in that they require scan databases once and compute the support of itemsets fast. Recent years, BitTable (Dong & Han, 2007) and IndexBitTable (Song, Yang, & Xu, 2008) approaches h...

متن کامل

Simultaneous mining of frequent closed itemsets and their generators: Foundation and algorithm

Closed itemsets and their generators play an important role in frequent itemset and association rule mining. They allow a lossless representation of all frequent itemsets and association rules and facilitate mining. Some recent approaches discover frequent closed itemsets and generators separately. The Close algorithm mines them simultaneously but it needs to scan the database many times. Based...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011