FARMER: Finding Interesting Rule Groups in Biological Datasets
نویسندگان
چکیده
The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows. For example, many gene expression datasets may contain up to 10,000100,000 columns but only 100-1000 rows. Association rules can reveal biological relevant associations between genes and environmental/categories to identify gene regulation pathways. However, most existing association rule mining algorithms have an exponential dependence on the number of columns. Moreover, the number of association rules generated from bioinformatic datasets are enormous due to the combinatorial explosion of frequent itemsets. In this paper, we describe a new algorithm called FARMER that is specially designed to discover interesting rule groups by identifying their upper bounds and lower bounds from biological datasets. FARMER exploits all user specified constraints including minimum support, minimum confidence and minimum chi-square to support efficient pruning. Several experiments on real bioinformatics datasets show that FARMER is orders of magnitude better than previous association rule mining algorithms.
منابع مشابه
Semantic Mining and Analysis of Gene Expression Data
Association rules can reveal biological relevant relationship between genes and environments / categories. However, most existing association rule mining algorithms are rendered impractical on gene expression data, which typically contains thousands or tens of thousands of columns (gene expression levels), but only tens of rows (samples). The main problem is that these algorithms have an expone...
متن کاملNumeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm
Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...
متن کاملRegional Association Rule Mining
This project [4] centers on regional association rule mining and scoping in spatial datasets. We introduces a methodology for mining spatial association rules and proposes new algorithms to determine the scope of a spatial association rule. We develop a reward-based region discovery framework that employs clustering to find interesting regions. The framework is applied to solve two distinct reg...
متن کاملEfficient Association Rule Mining Using Improved Apriori Algorithm
Association rule mining is a data mining technique to extract interesting relationships from large datasets [1, 2]. The efficiency of association rule mining algorithms has been a challenging research area in the domain of data mining [3]. Frequent pattern discovery, the task of finding sets of items that frequently occur together in a dataset is the most resource consuming phase of the rule mi...
متن کاملSmart Drill Down
We present smart drill-down, an operator for interactively exploring a relational table to discover and summarize “interesting” groups of tuples. Each group of tuples is described by a rule. For instance, the rule (a, b, ?, 1000) tells us that there are a thousand tuples with value a in the first column and b in the second column (and any value in the third column). Smart drill-down presents an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003