On Mining Max Frequent Generalized Itemsets

نویسندگان

  • Donghui Zhang
  • Daniel Kunkle
چکیده

A fundamental task of data mining is to mine frequent itemsets. Since the number of frequent itemsets may be large, a compact representation, namely the max frequent itemsets, has been introduced. On the other hand, the concept of generalized itemsets was proposed. Here, the items form a taxonomy. Although the transactional database only contains items in the leaf level of the taxonomy, a generalized itemset may contain some non-leaf generalized items. Naturally, an interesting question arises: can we combine the two concepts and develop algorithms to mine max frequent generalized itemsets? This is a compact representation of the set of all frequent generalized itemsets. To the best of our knowledge, this paper is the first work that efficiently solves this new problem. Our solution has the following components: a conceptual classification tree, the algorithm MFGI class that dynamically generates the needed part of the conceptual classification tree by applying three pruning techniques, an online method for checking false positives, and an optimization technique called PHDB that batch-compute frequencies. Besides market-basket analysis, our technique has a vast range of other applications. For instance, identifying associations among (categories of) diseases, identifying associations among (groups of) occupations, and so on. Experimental results demonstrate the efficiency of our approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

Fast Algorithms for Mining Generalized Frequent Patterns of Generalized Association Rules

Mining generalized frequent patterns of generalized association rules is an important process in knowledge discovery system. In this paper, we propose a new approach for efficiently mining all frequent patterns using a novel set enumeration algorithm with two types of constraints on two generalized itemset relationships, called subset-superset and ancestor-descendant constraints. We also show a...

متن کامل

A new method for finding generalized frequent itemsets in generalized association rule mining

Generalized association rule mining is an extension of traditional association rule mining to discover more informative rules, given a taxonomy. In this paper, we describe a formal framework for the problem of mining generalized association rules. In the framework, The subset-superset and the parent-child relationships among generalized itemsets are introduced to present the different views of ...

متن کامل

Mining Generalized Closed Frequent Itemsets of Generalized Association Rules

In the area of knowledge discovery in databases, the generalized association rule mining is an extension from the traditional association rule mining by given a database and taxonomy over the items in database. More initiative and informative knowledge can be discovered. In this work, we propose a novel approach of generalized closed itemsets. A smaller set of generalized closed itemsets can be...

متن کامل

CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data

Recently, frequent itemsets mining over data streams attracted much attention. However, mining closed itemsets from data stream has not been well addressed. The main difficulty lies in its high complexity of maintenance aroused by the exact model definition of closed itemsets and the dynamic changing of data streams. In data stream scenario, it is sufficient to mining only approximated frequent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005