Efficiently Methods for Embedded Frequent Subtree Mining on Biological Data
نویسندگان
چکیده
As a technology based on database, statistics and AI, data mining provides biological research a useful information analyzing tool. The key factors which influence the performance of biological data mining approaches are the large-scale of biological data and the high similarities among patterns mined. In this paper, we present an efficient algorithm named IRTM for mining frequent subtrees embedded in biological data. We also advanced a string encoding method for representing the trees, and a scope-list for extending all substrings for frequency test. The IRTM algorithm adopts vertically mining approach, and uses some pruning techniques to further reduce the computational time and space cost. Experimental results show that IRTM algorithm can achieve significantly performance improvement over previous works.
منابع مشابه
PrefixTreeESpan: A Pattern Growth Algorithm for Mining Embedded Subtrees
Frequent embedded subtree pattern mining is an important data mining problem with broad applications. In this paper, we propose a novel embedded subtree mining algorithm, called PrefixTreeESpan (i.e. Prefix-Treeprojected Embedded-Subtree pattern), which finds a subtree pattern by growing a frequent prefix-tree. Thus, using divide and conquer, mining local length-1 frequent subtree patterns in P...
متن کاملDiscovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees
Recent years have witnessed a surge of research interest in knowledge discovery from data domains with complex structures, such as trees and graphs. In this paper, we address the problem of mining maximal frequent embedded subtrees which is motivated by such important applications as mining “hot” spots of Web sites from Web usage logs and discovering significant “deep” structures from tree-like...
متن کاملA New Marketing Channel Management Strategy Based on Frequent Subtree Mining
For most manufacturers, success or failure is determined by how effectively and efficiently their products are sold through their marketing channel members, so the management of marketing channels plays an important role in market competition. Most existing work studies the problem of marketing channel management in a qualitative way. Recently, with the increase of amount of sales data, how to ...
متن کاملFrequent Subtree Mining - An Overview
Mining frequent subtrees from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for the more expressive power of labeled trees to capture the complex relations among data entities. Although frequent subtree mining is a more diffic...
متن کاملPCITMiner- Prefix-based Closed Induced Tree Miner for finding closed induced frequent subtrees
Frequent subtree mining has attracted a great deal of interest among the researchers due to its application in a wide variety of domains. Some of the domains include bio informatics, XML processing, computational linguistics, and web usage mining. Despite the advances in frequent subtree mining, mining for the entire frequent subtrees is infeasible due to the combinatorial explosion of the freq...
متن کامل