Mining Association Algorithm with Threshold based on ROC Analysis
نویسندگان
چکیده
The mining association algorithm is one of the most important data mining algorithms to derive association rules at high speed from huge databases. However, the algorithm tends to derive those rules that contain noises such as stopwords then some systems remove the noises using noise filters. We have been improving the algorithm and developing navigation systems for semi-structured data using the algorithm, and we also use a dictionary to remove noises from derived association rules. In order to derive effective rules, it is very important how to determine system parameters such as threshold values of the minimum support and the minimum confidence. Then we have adapted the ROC analysis to the algorithm on our navigation systems and evaluated the performance of derived rules. In this paper, we import the parameters from the ROC analysis into the algorithm to propose extended mining association algorithms. Moreover, we evaluate the performance of our proposed algorithms using a experimental database and show how our proposed algorithms can derive effective association rules. We also show that our proposed algorithms can remove stopwords automatically from raw data.
منابع مشابه
Mining Association Algorithm with Improved Threshold Based on ROC Analysis
The mining association algorithm is one of the most popular data mining algorithms to derive association rules at high speed from huge databases. We have been developing navigation systems for semi-structured data like as Web data and bibliographic data. To navigate beginners, our systems give the association rules derived by the algorithm. However; the algorithm tends to derive those rules tha...
متن کاملA new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining
Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...
متن کاملIntroducing an algorithm for use to hide sensitive association rules through perturb technique
Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the as...
متن کاملInvestigating the Effect of Land Use and Soil’s Physio-chemical properties on Wind Erosion Threshold Velocities via Data Mining
Introduction: Wind erosion is a phenomenon that causes severe environmental changes in arid and semi-arid climates. As surface soil texture is very effective in soil erodibility, identifying soil erodibility index is important and efficient. Mismanagement greatly contributes to the development of wind erosion. The velocity that makes the first particles of soil move from the surface is called t...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کامل