A comprehensive method for discovering the maximal frequent set
نویسنده
چکیده
The association rule mining can be divided into two steps.The first step is to find out all frequent itemsets, whose occurrences are greater than or equal to the user-specified threshold.The second step is to generate reliable association rules based on all frequent itemsets found in the first step. Identifying all frequent itemsets in a large database dominates the overall performance in the association rule mining. In this paper, we propose an efficient method, INCREMENTAL PINCER, for discovering the maximal frequent itemsets. The INCREMENTAL PINCER method combines the advantages of both the DHP and the Pincer-Search algorithms. The combination leads to two advantages. First, the INCREMENTAL PINCER method, in general, can reduce the number of database scans. Second, the INCREMENTAL PINCER can filter the infrequent candidate itemsets and can use the filtered itemsets to find the maximal frequent itemsets. These two advantages can reduce the overall computing time of finding the maximal frequent itemsets. In addition, the INCREMENTAL PINCER method also provides an efficient mechanism to construct the maximal frequent candidate itemsets to reduce the search space. Keyterms: association rules, data mining, frequent itemsets, the INCREMENTAL PINCER method
منابع مشابه
Discovering Maximal Frequent Item set using Association Array and Depth First Search Procedure with Effective Pruning Mechanisms
The first step of association rule mining is finding out all frequent itemsets. Generation of reliable association rules are based on all frequent itemsets found in the first step. Obtaining all frequent itemsets in a large database leads the overall performance in the association rule mining. In this paper, an efficient method for discovering the maximal frequent itemsets is proposed. This met...
متن کاملAn Algorithm for Mining Maximum Frequent Itemsets Using Data-sets Condensing and Intersection Pruning
Discovering maximal frequent itemset is a key issue in data mining; the Apriori-like algorithms use candidate itemsets generating/testing method, but this approach is highly time-consuming. To look for an algorithm that can avoid the generating of vast volume of candidate itemsets, nor the generating of frequent pattern tree, DCIP algorithm uses data-set condensing and intersection pruning to f...
متن کاملPincer-Search: A New Algorithm for Discovering the Maximum Frequent Set
Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-rst search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) freq...
متن کاملFinding All Maximal Frequent Sequences in Text
In this paper we present a novel algorithm for discovering maximal frequent sequences in a set of documents, i.e., such sequences of words that are frequent in the document collection and, moreover, that are not contained in any other longer frequent sequence. A sequence is considered to be frequent if it appears in at least documents, when is the frequency threshold given. Our approach combine...
متن کاملDiscovery of Frequent Word Sequences in Text
We have developed a method that extracts all maximal frequent word sequences from the documents of a collection. A sequence is said to be frequent if it appears in more than documents, in which is the frequency threshold given. Furthermore, a sequence is maximal, if no other frequent sequence exists that contains this sequence. The words of a sequence do not have to appear in text consecutively...
متن کامل