A Direct Algorithm for Computing the Transitive Closure of a Two-Dimensionally Structured File
نویسندگان
چکیده
It is well-known that the computation cost to nd the transitive closure (TC) of a graph stored as an adjacency matrix is the same, to within a constant factor, as matrix multiplication. In this paper, we present a new TC algorithm based on double hashing and two-dimensionally organized les. We show that, when using this algorithm, the computation and i/o costs of nding the TC of a database relation is like that of performing a relational composition operation. For sparse closures, sparse compositions will be performed, which may be signiicantly more thrifty than the corresponding matrix operations, which must be at least O(n 2), and for which most algorithms are O(n 3).
منابع مشابه
The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملDirect Algorithms for Computing the Transitive Closure of Database Relations
We present new algorithms for computing the transitive closure of large database relations. Unlike iterative algorithms, such as the semi-naive and the logarithmic algorithms, the termination of our algorithms does not depend on the length of paths in the underlying graph (hence, the name direct algorithms). We also present simulation results that show that these direct algorithms perform unifo...
متن کاملDistributed Algorithms for the Transitive Closure
Many database queries, such as reachability and regular path queries, can be reduced to finding the transitive closure of the underlying graph. For calculating the transitive closure of large graphs, a distributed computation framework is required to handle the large data volume (which can approach O(|V |) space). Map Reduce was not originally designed for recursive computations, but recent wor...
متن کاملA Parallel and Distributed Approach for Finding Transitive Closures of Data Records: A Proposal
In this paper, we propose an approach to find transitive closures on large data sets in distributed (i.e., parallel) environment. Finding transitive closures of data records is a preprocessing step of a two-step approach to data quality control, such as data accuracy, redundancy, consistency, currency and completeness. The objective of finding transitive closures is to reduce the number of reco...
متن کاملTransitive closure algorithm MEMTC and its performance analysis
We present a new algorithm for computing the full transitive closure designed for operation in layered memories. We analyze its average-case performance experimentally in an environment where two layers of memory of diierent speed are used. In our analysis, we use trace-based simulation of memory operations.
متن کامل