A Hybrid Method for High-Utility Itemsets Mining in Large High-Dimensional Data

نویسندگان

  • Guangzhu Yu
  • Shihuang Shao
  • Bin Luo
  • Xianhui Zeng
چکیده

Existing algorithms for high-utility itemsets mining are column enumeration based, adopting an Apriorilike candidate set generation-and-test approach, and thus are inadequate in datasets with high dimensions or long patterns. To solve the problem, this paper proposed a hybrid model and a row enumerationbased algorithm, i.e., Inter-transaction, to discover high-utility itemsets from two directions: an existing algorithm can be used to seek short high-utility itemsets from the bottom, while Inter-transaction can be used to seek long high-utility itemsets from the top. Inter-transaction makes full use of the characteristic that there are few common items between or among long transactions. By intersecting relevant transactions, the new algorithm can identify long high-utility itemsets, without extending short itemsets step by step. In addition, we also developed new pruning strategies and an optimization technique to improve the performance of Inter-transaction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

روشی کارا برای کاوش مجموعه اقلام پرتکرار در تحلیل داده‌های سبد خرید

Discovery of hidden and valuable knowledge from large data warehouses is an important research area and has attracted the attention of many researchers in recent years. Most of Association Rule Mining (ARM) algorithms start by searching for frequent itemsets by scanning the whole database repeatedly and enumerating the occurrences of each candidate itemset. In data mining problems, the size of ...

متن کامل

Enhancing the Performance of Mining High Utility Itemsets Based On Pattern Algorithm

Data Mining is the process of analyzing data from different perspectives and summarizing it into useful information. An association in data mining indicates a logical dependency between various attributes of an entity. Association rule mining (ARM) is the process of mining past data for association rules. ARM only find the frequency of itemsets, which will not provide large amount of profit. Ut...

متن کامل

Mining Long High Utility Itemsets in Transaction Databases

Although support has been used as a fundamental measure to determine the statistical importance of an itemset, it can’t express other richer information such as quantity sold, unit profit, or other numerical attributes. To overcome the shortcoming, utility is used to measure the semantic importance and several algorithms for utility mining have been proposed. However, existing algorithms for ut...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJDWM

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2009