Efficient Implementation of Lazy Suffix Trees
نویسندگان
چکیده
We present an efficient implementation of a write-only topdown construction for suffix trees. Our implementation is based on a new, space-efficient representation of suffix trees which requires only 12 bytes per input character in the worst case, and 8.5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of suffix trees such that a subtree is evaluated not before it is traversed for the first time. Our experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy top-down construction is often faster and more space efficient than other methods.
منابع مشابه
A hard-disk based suffix tree implementation
Suffix trees are incredibly useful structures for computational genomics and combinatorial pattern matching. Due to the small alphabet sizes used in computational genomics, specialised hard-disk based suffix trees have been designed, but the problem of creating an efficient hard-disk based suffix tree for large and unbounded alphabet sizes remains essentially unsolved. We have designed a hard-d...
متن کاملEfficient Implementation of Suffix Trees
We study the problem of string searching using the traditional approach of storing all unique substrings of the text in a suffix tree. The methods of path compression, level compression and data compression are combined to build a simple, compact and efficient implementation of a suffix tree. Based on a comparative discussion and extensive experiments, we argue that our new data structure is su...
متن کاملFrom Nondeterministic Suffix Automaton to Lazy Suffix Tree
Given two strings, a pattern P of length m and a text T of length n over some alphabet Σ of size σ, we consider the exact string matching problem, i.e. we want to report all occurrences of P in T . The well-known Backward-Nondeterministic-DAWG-Matching (BNDM) algorithm is one of the most efficient algorithm for short to moderate length patterns. In this paper – as a prelude – we take the underl...
متن کاملCompact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth
Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...
متن کاملComputing suffix links for suffix trees and arrays
We present a new and simple algorithm to reconstruct suffix links in suffix trees and suffix arrays. The algorithm is based on observations regarding suffix tree construction algorithms. With our algorithm we bring suffix arrays even closer to the ease of use and implementation of suffix trees.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Softw., Pract. Exper.
دوره 33 شماره
صفحات -
تاریخ انتشار 1999