Mining itemset utilities from transaction databases
نویسندگان
چکیده
The rationale behind mining frequent itemsets is that only itemsets with high frequency are of interest to users. However, the practical usefulness of frequent itemsets is limited by the significance of the discovered itemsets. A frequent itemset only reflects the statistical correlation between items, and it does not reflect the semantic significance of the items. In this paper, we propose a utility based itemset mining approach to overcome this limitation. The proposed approach permits users to quantify their preferences concerning the usefulness of itemsets using utility values. The usefulness of an itemset is characterized as a utility constraint. That is, an itemset is interesting to the user only if it satisfies a given utility constraint. We show that the pruning strategies used in previous itemset mining approaches cannot be applied to utility constraints. In response, we identify several mathematical properties of utility constraints. Then, two novel pruning strategies are designed. Two algorithms for utility based itemset mining are developed by incorporating these pruning strategies. The algorithms are evaluated by applying them to synthetic and real world databases. Experimental results show that the proposed algorithms are effective on the databases tested. 2005 Elsevier B.V. All rights reserved.
منابع مشابه
A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURI
Classical frequent itemset mining identifies frequent itemsets in transaction databases using only frequency of item occurrences, without considering utility of items. In many real world situations, utility of itemsets are based upon user’s perspective such as cost, profit or revenue and are of significant importance. Utility mining considers using utility factors in data mining tasks. Utility-...
متن کاملA Foundational Approach to Mining Itemset Utilities from Databases
Most approaches to mining association rules implicitly consider the utilities of the itemsets to be equal. We assume that the utilities of itemsets may differ, and identify the high utility itemsets based on information in the transaction database and external information about utilities. Our theoretical analysis of the resulting problem lays the foundation for future utility mining algorithms.
متن کاملDiscovery of High Utility Itemsets Using Genetic Algorithm with Ranked Mutation
Utility mining is the study of itemset mining from the consideration of utilities. It is the utility-based itemset mining approach to find itemsets conforming to user preferences. Modern research in mining high-utility itemsets (HUI) from the databases faces two major challenges: exponential search space and database-dependent minimum utility threshold. The search space is extremely vast when t...
متن کاملGenerating Frequent Patterns Through Intersection Between Transactions
the problem of frequent itemset mining is considered in this paper. One new technique proposed to generate frequent patterns in large databases without time-consuming candidate generation. This technique is based on focusing on transaction instead of concentrating on itemset. This algorithm based on take intersection between one transaction and others transaction and the maximum shared items be...
متن کاملHigh Utility Itemset Mining
Data Mining can be defined as an activity that extracts some new nontrivial information contained in large databases. Traditional data mining techniques have focused largely on detecting the statistical correlations between the items that are more frequent in the transaction databases. Also termed as frequent itemset mining , these techniques were based on the rationale that itemsets which appe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Data Knowl. Eng.
دوره 59 شماره
صفحات -
تاریخ انتشار 2006