A Scalable Data Analytics Algorithm for Mining Frequent Patterns from Uncertain Data
نویسندگان
چکیده
With advances in technology, massive amounts of valuable data can be collected and transmitted at high velocity in various scientific, biomedical or engineering applications. Hence, scalable data analytics tools are in demand for analyzing these data. For example, scalable tools for association analysis help reveal frequently occurring patterns and their relationships, which in turn lead to intelligent decisions. While a majority of existing frequent pattern mining algorithms—including FPgrowth—handle only precise data, there are situations in which data are uncertain. In recent years, researchers have paid attention to frequent pattern mining from uncertain data. UF-growth and UFP-growth are examples of tree-based algorithms for mining uncertain data. However, their corresponding tree structures can be large. Other tree structures for handling uncertain data may achieve compactness at the expense of loose upper bounds on expected supports. To solve this problem, we propose (i) a compact tree structure that captures uncertain data with tighter upper bounds than aforementioned tree structures and (ii) a scalable data analytics algorithm that mines frequent patterns from our tree structure. Experimental results show the tightness of bounds to expected supports provided by our algorithm.
منابع مشابه
Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملReview of Algorithm for Mining Frequent Patterns from Uncertain Data
Mining frequent patterns from traditional database is an important research topic in data mining and researchers achieved tremendous progress in this field. However, with high volumes of uncertain data generated in distributed environments in many of biological, medical and life science application in the past ten years, researchers have proposed different solutions in extending the conventiona...
متن کاملData sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملA Survey Paper on Frequent Pattern Mining for Uncertain Database
There are number of existing algorithms proposed that mines frequent patterns from certain or precise data. But know a day’s demand of uncertain data mining is increased. There are many situations in which data are uncertain. For frequent pattern mining from uncertain data mainly two approaches are proposed that are level-wise approach and pattern-growth approach. Level-wise approach use the ge...
متن کاملVertical Mining of Frequent Patterns from Uncertain Data
Efficient algorithms have been developed for mining frequent patterns in traditional data where the content of each transaction is definitely known. There are many applications that deal with real data sets where the contents of the transactions are uncertain. Limited research work has been dedicated for mining frequent patterns from uncertain data. This is done by extending the state of art ho...
متن کامل