Fast BWT in small space by blockwise suffix sorting
نویسنده
چکیده
The usual way to compute the Burrows–Wheeler transform (BWT) [3] of a text is by constructing the suffix array of the text. Even with space-efficient suffix array construction algorithms [12, 2], the space requirement of the suffix array itself is often the main factor limiting the size of the text that can be handled in one piece, which is crucial for constructing compressed text indexes [4, 5]. Typically, the suffix array needs 4n bytes while the text and the BWT need only n bytes each and sometimes even less, for example 2n bits each for a DNA sequence. We reduce the space dramatically by constructing the suffix array in blocks of lexicographically consecutive suffixes. Given such a block, the corresponding block of the BWT is trivial to compute.
منابع مشابه
A Linear-Time Burrows-Wheeler Transform Using Induced Sorting
To compute Burrows-Wheeler Transform (BWT), one usually builds a suffix array (SA) first, and then obtains BWT using SA, which requires much redundant working space. In previous studies to compute BWT directly [6, 13], one constructs BWT incrementally, which requires O(n logn) time where n is the length of the input text. We present an algorithm for computing BWT directly in linear time by modi...
متن کاملA Fast Suffix-Sorting Algorithm
We present an algorithm to sort all suffixes of x = (x1, . . . , xn) ∈ Xn lexicographically, where X = {0, . . . , q−1}. Fast and efficient sorting of a large amount of data according to its suffix structure (suffix-sorting) is a useful technology in many fields of application, front-most in the field of Data Compression where it is used e.g. for the Burrows and Wheeler Transformation (BWT for ...
متن کاملStructures of String Matching and Data Compression
This doctoral dissertation presents a range of results concerning efficient algorithms and data structures for string processing, including several schemes contributing to sequential data compression. It comprises both theoretic results and practical implementations. We study the suffix tree data structure, presenting an efficient representation and several generalizations. This includes augmen...
متن کاملA Modified Burrows-Wheeler Transformation for Case-Insensitive Search with Application to Suffix Array Compression
Now the Block sorting compression [l] becomes common by its good balance of compression ratio and speed. It has another nice feature, which is the relation between encoding/decoding process and suffix array. The suffix array [2] is a memory-efficient data structure for searching any substring of a text. It is an array of lexicographically sorted pointers to suffixes of a text. It is also used f...
متن کاملLinear-time string indexing and analysis in small space
The field of succinct data structures has flourished over the last 16 years. Starting from the compressed suffix array by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and applications of string indexes based on the Burrows-Wheeler transform (BWT) have been developed, all taking an amount of space that is close to the input size...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Theor. Comput. Sci.
دوره 387 شماره
صفحات -
تاریخ انتشار 2007