E ect of Data Skewness in Parallel Mining ofAssociation
نویسنده
چکیده
An eecient parallel algorithm FPM(Fast Parallel Mining) for mining association rules on a shared-nothing parallel system has been proposed. It adopts the count distribution approach and has incorporated two powerful candidate pruning techniques, i.e., distributed pruning and global pruning. It has a simple communication scheme which performs only one round of message exchange in each iteration. We found that the two pruning techniques are very sensitive to data skewness, which describes the degree of non-uniformity of the itemset distribution among the database partitions. Distributed pruning is very eeective when data skewness is high. Global pruning is more eeective than distributed pruning even for the mild data skewness case. We have implemented the algorithm on an IBM SP2 parallel machine. The performance studies connrm our observation on the relationship between the eeectiveness of the two pruning techniques and data skewness. It has also shown that FPM outperforms CD (Count Distribution) consistently, which is a parallel version of the popular Apriori algorithm 2, 3]. Furthermore, FPM has nice parallelism of speedup, scaleup and sizeup.
منابع مشابه
Effect of Data Skewness and Workload Balance in Parallel Data Mining
To mine association rules efficiently, we have developed a new parallel mining algorithm FPM on a distributed share-nothing parallel system in which data are partitioned across the processors. FPM is an enhancement of the FDM algorithm, which we proposed previously for distributed mining of association rules [8]. FPM requires fewer rounds of message exchanges than FDM and hence has a better res...
متن کاملA Hypotheses-based Method for Identifying Skewed Itemsets
Parallel and distributed association rule mining are very important research subjects, with various work addressing them. Data skewness, which describes the degree of non-uniformity of the itemset distribution among database partitions, causes various problems to parallel and distributed association rule mining algorithms, such as the generation of many false candidate itemsets. However, some a...
متن کاملKnowledge and attitude of the families with a mental patient about the electroconvulsive therapy at the Ebn-e- Sina psychiatric center of Mashhad
Introduction: Electroconvulsive therapy is the most common and also unique method in ‎psychiatry that is used for treatment of many mental disorders such as major depression, ‎schizophrenia and etc. Despite of safety and usefulness of this method, the patients and ‎families have a great deal of stress about it. This stressful approach caused negative attitudes ‎and difficulties ...
متن کاملEFFECTS OF DESMOPRESSIN ON MEMORY DISORDERS DUE TO ELECTROCONVULSIVE THERAPY (ECT) IN HUMANS
Electroconvulsive therapy (ECT) is an efficient treatment for several neuropsychiatric disorders however a large number of patients develop memory impairment after ECT. Different studies both on animals and human suggest that vasopressin has positive effects on memory and improves cognitive functions. In this randomized, double-blind controlled clinical trial, 50 patients with psychiatric d...
متن کاملPortfolio Performance Evaluation in a Modified Mean-Variance-Skewness Framework with Negative Data
The present study is an attempt toward evaluating the performance of portfolios using mean-variance-skewness model with negative data. Mean-variance non-linear framework and mean-variance-skewness non- linear framework had been proposed based on Data Envelopment Analysis, which the variance of the assets had been used as an input to the DEA and expected return and skewness were the output. C...
متن کامل