Fast BWT in small space by blockwise suffix sorting

نویسنده

  • Juha Kärkkäinen
چکیده

The usual way to compute the Burrows–Wheeler transform (BWT) [3] of a text is by constructing the suffix array of the text. Even with space-efficient suffix array construction algorithms [12, 2], the space requirement of the suffix array itself is often the main factor limiting the size of the text that can be handled in one piece, which is crucial for constructing compressed text indexes [4, 5]. Typically, the suffix array needs 4n bytes while the text and the BWT need only n bytes each and sometimes even less, for example 2n bits each for a DNA sequence. We reduce the space dramatically by constructing the suffix array in blocks of lexicographically consecutive suffixes. Given such a block, the corresponding block of the BWT is trivial to compute.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Linear-Time Burrows-Wheeler Transform Using Induced Sorting

To compute Burrows-Wheeler Transform (BWT), one usually builds a suffix array (SA) first, and then obtains BWT using SA, which requires much redundant working space. In previous studies to compute BWT directly [6, 13], one constructs BWT incrementally, which requires O(n logn) time where n is the length of the input text. We present an algorithm for computing BWT directly in linear time by modi...

متن کامل

A Fast Suffix-Sorting Algorithm

We present an algorithm to sort all suffixes of x = (x1, . . . , xn) ∈ Xn lexicographically, where X = {0, . . . , q−1}. Fast and efficient sorting of a large amount of data according to its suffix structure (suffix-sorting) is a useful technology in many fields of application, front-most in the field of Data Compression where it is used e.g. for the Burrows and Wheeler Transformation (BWT for ...

متن کامل

Structures of String Matching and Data Compression

This doctoral dissertation presents a range of results concerning efficient algorithms and data structures for string processing, including several schemes contributing to sequential data compression. It comprises both theoretic results and practical implementations. We study the suffix tree data structure, presenting an efficient representation and several generalizations. This includes augmen...

متن کامل

A Modified Burrows-Wheeler Transformation for Case-Insensitive Search with Application to Suffix Array Compression

Now the Block sorting compression [l] becomes common by its good balance of compression ratio and speed. It has another nice feature, which is the relation between encoding/decoding process and suffix array. The suffix array [2] is a memory-efficient data structure for searching any substring of a text. It is an array of lexicographically sorted pointers to suffixes of a text. It is also used f...

متن کامل

Linear-time string indexing and analysis in small space

The field of succinct data structures has flourished over the last 16 years. Starting from the compressed suffix array by Grossi and Vitter (STOC 2000) and the FM-index by Ferragina and Manzini (FOCS 2000), a number of generalizations and applications of string indexes based on the Burrows-Wheeler transform (BWT) have been developed, all taking an amount of space that is close to the input size...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 387  شماره 

صفحات  -

تاریخ انتشار 2007