Sublinear Algorithms in the External Memory Model
نویسندگان
چکیده
We initiate the study of sublinear-time algorithms in the external memory model [Vit01]. In this model, the data is stored in blocks of a certain size B, and the algorithm is charged a unit cost for each block access. This model is well-studied, since it reflects the computational issues occurring when the (massive) input is stored on a disk. Since each block access operates on B data elements in parallel, many problems have external memory algorithms whose number of block accesses is only a small fraction (e.g. 1/B) of their main memory complexity. However, to the best of our knowledge, no such reduction in complexity is known for any sublinear-time algorithm. One plausible explanation is that the vast majority of sublinear-time algorithms use random sampling and thus exhibit no locality of reference. This state of affairs is quite unfortunate, since both sublinear-time algorithms and the external memory model are important approaches to dealing with massive data sets, and ideally they should be combined to achieve best performance. We show that such combination is indeed possible. In particular, we consider three wellstudied problems: testing of distinctness, uniformity and identity of an empirical distribution induced by data. For these problems we show random-sampling-based algorithms whose number of block accesses is up to a factor of 1/ √ B smaller than the main memory complexity of those problems. We also show that this improvement is optimal for those problems. Since these problems are natural primitives for a number of sampling-based algorithms for other problems, our tools improve the external memory complexity of other problems as well.
منابع مشابه
External Sampling Publisher Accessed Terms of Use Detailed Terms External Sampling
We initiate the study of sublinear-time algorithms in the external memory model [14]. In this model, the data is stored in blocks of a certain size B, and the algorithm is charged a unit cost for each block access. This model is well-studied, since it reflects the computational issues occurring when the (massive) input is stored on a disk. Since each block access operates on B data elements in ...
متن کاملExternal Sampling
We initiate the study of sublinear-time algorithms in the external memory model [14]. In this model, the data is stored in blocks of a certain size B, and the algorithm is charged a unit cost for each block access. This model is well-studied, since it reflects the computational issues occurring when the (massive) input is stored on a disk. Since each block access operates on B data elements in ...
متن کاملA Validating XML Documents in the Streaming Model with External Memory
We study the problem of validating XML documents of sizeN against general DTDs in the context of streaming algorithms. The starting point of this work is a well-known space lower bound. There are XML documents and DTDs for which p-pass streaming algorithms require Ω(N/p) space. We show that when allowing access to external memory, there is a deterministic streaming algorithm that solves this pr...
متن کاملSolving Geometric Problems in Space-Conscious Models
When dealing with massive data sets, standard algorithms may easily “run out of memory”. In this thesis, we design efficient algorithms in space-conscious models. In particular, in-place algorithms, multi-pass algorithms, read-only algorithms, and stream-sort algorithms are studied, and the focus is on fundamental geometric problems, such as 2D convex hulls, 3D convex hulls, Voronoi diagrams an...
متن کاملExternal-Memory Breadth-First Search with Sublinear I/O
Breadth-first search (BFS) is a basic graph exploration technique. We give the first external memory algorithm for sparse undirected graphs with sublinear I/O. The best previous algorithm requires (n + n+m D B logM=B n+m B ) I/Os on a graph with n nodes and m edges and a machine with main-memory of sizeM ,D parallel disks, and block size B. We present a new approach which requires only O(qn (n+...
متن کامل