Average Case Performance of the Apriori

نویسندگان

  • Paul Purdom
  • Dirk Van Gucht
چکیده

The Apriori Algorithm examines baskets of items to determine which subsets of the items occur in lots of baskets. Suppose we wish to determine which items sets occur in at least k baskets. The algorithm considers item sets of size l in the order l = 1, 2, : : :. The only way this algorithm can determine that a set occurs at least k times is to count the k occurrences, but it sometimes determines (without counting) that a set occurs less than k times by noticing that some subsets of the l items occur less than k times. For algorithms that require explicit counting to verify the k occurrences, it is useful to seperate the total time into the \success time"; that is used to verify k occurrences, and the \failure time"; that is used to process sets which have less than k occurrences. This paper derives both exact and asymptotic formulas for both success and failure times in the case where the baskets are lled randomly with probability p (each shopper independently buys each item). The Apriori Algorithm considers almost every possible set of l items for those l where k bp l and almost no sets for larger l. For most applications the largest l such that k bp l is not very large. When it is less than one half of the number of items (essentially the only case of interest), the work associated with this largest such l dominates the running time. The probability that a particular set needs processing approaches zero at a rate that is a negative exponential function of the square of the diierence bp l ? k when k is above bp l. When k is large compared to 1, the probability that the set needs processing approaches 1 at a similar negative exponential rate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fuzzy Apriori Rule Extraction Using Multi-Objective Particle Swarm Optimization: The Case of Credit Scoring

There are many methods introduced to solve the credit scoring problem such as support vector machines, neural networks and rule based classifiers. Rule bases are more favourite in credit decision making because of their ability to explicitly distinguish between good and bad applicants.In this paper multi-objective particle swarm is applied to optimize fuzzy apriori rule base in credit scoring. ...

متن کامل

Fuzzy Apriori Rule Extraction Using Multi-Objective Particle Swarm Optimization: The Case of Credit Scoring

There are many methods introduced to solve the credit scoring problem such as support vector machines, neural networks and rule based classifiers. Rule bases are more favourite in credit decision making because of their ability to explicitly distinguish between good and bad applicants.In this paper multi-objective particle swarm is applied to optimize fuzzy apriori rule base in credit scoring. ...

متن کامل

Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO)

A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata f...

متن کامل

Fuzzy association rule mining approaches for enhancing prediction performance

This paper presents an investigation into two fuzzy association rule mining models for enhancing prediction performance. The first model (the FCM-Apriori model) integrates Fuzzy C-Means (FCM) and the Apriori approach for road traffic performance prediction. FCM is used to define the membership functions of fuzzy sets and the Apriori approach is employed to identify the Fuzzy Association Rules (...

متن کامل

Eco-Efficiency Evaluation in Two-Stage Network Structure: Case Study: Cement Companies

The cement industry, as a primary trade, plays an important role in the development of a country's organization. This industry in Iran, however, despite of profuse benefits such as high-value mines, faces many challenges. Problems such as exploitation of the production require the need for doing research into this area. The main purpose of this paper is to examine the Eco-efficiency in Iran's 2...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007