Efficiency Improvements for Parallel Subgraph Miners
نویسندگان
چکیده
Algorithms for finding frequent and/or interesting subgraphs in a single large graph scenario are computationally intensive because of the graph isomorphism and the subgraph isomorphism problem. These problems are compounded by the size of most real-world datasets which have sizes in the order of 10 or 10. The SUBDUE algorithm developed by Cook and Holder finds the most compressing subgraph in a large graph. In order to perform the same task on real-world data sets efficiently, Cook et al. developed a parallel approach to SUBDUE called the SP-SUBDUE based on the MPI framework. This paper extends the work done by Cook et al. to improve the efficiency of MPI SUBDUE by modifying the evaluation phase. Our experiments show an improvement in speed-up while retaining the quality of the results of serial SUBDUE. The techniques that we have used in this study can also be used in similar algorithms which use static partitioning of the data and re-evaluation of locally interesting patterns over all the nodes of the cluster.
منابع مشابه
Frequent Subgraph Miners: Runtimes Don’t Say Everything
In recent years several frequent subgraph miners were proposed. The authors of these new algorithms typically compared the runtimes of their implementations with those of previous implementations to confirm the efficiency of their methods. To get a better perspective on the mutual benefits of the algorithms, Wörlein et al. [9] performed an experimental evaluation of re-implementations of severa...
متن کاملThe ParMol Package for Frequent Subgraph Mining
Mining for frequent subgraphs in a graph database has become a popular topic in the last years. Algorithms to solve this problem are used in chemoinformatics to find common molecular fragments in a database of molecules represented as two-dimensional graphs. However, the search process in arbitrary graph structures includes costly graph and subgraph isomorphism tests. In our ParMol package we h...
متن کاملGraph-Based Knowledge Discovery: Compression versus Frequency
There are two primary types of graph-based data miners: frequent subgraph and compression-based miners. With frequent subgraph miners, the most interesting substructure is the largest one (or ones) that meet the minimum support. Whereas, compression-based graph miners discover those subgraphs that maximize the amount of compression that a particular substructure provides a graph. The algorithms...
متن کاملA new algorithm for mining frequent connected subgraphs based on adjacency matrices
Most of the Frequent Connected Subgraph Mining (FCSM) algorithms have been focused on detecting duplicate candidates using canonical form (CF) tests. CF tests have high computational complexity, which affects the efficiency of graph miners. In this paper, we introduce novel properties of the canonical adjacency matrices for reducing the number of CF tests in FCSM. Based on these properties, a n...
متن کاملOn Speeding up Frequent Approximate Subgraph Mining
Frequent approximate subgraph (FAS) mining has become an interesting task with wide applications in several domains of science. Most of the previous studies have been focused on reducing the search space or the number of canonical form (CF) tests. CF-tests are commonly used for duplicate detection; however, these tests affect the efficiency of mining process because they have high computational...
متن کامل