A bioinformatician’s guide to the forefront of suffix array construction algorithms

نویسندگان

  • Anish Man Singh Shrestha
  • Martin C. Frith
  • Paul Horton
چکیده

The suffix array and its variants are text-indexing data structures that have become indispensable in the field of bioinformatics. With the uninitiated in mind, we provide an accessible exposition of the SA-IS algorithm, which is the state of the art in suffix array construction. We also describe DisLex, a technique that allows standard suffix array construction algorithms to create modified suffix arrays designed to enable a simple form of inexact matching needed to support 'spaced seeds' and 'subset seeds' used in many biological applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic extended suffix arrays

The suffix tree data structure has been intensively described, studied and used in the eighties and nineties, its linear-time construction counterbalancing his spaceconsuming requirements. An equivalent data structure, the suffix array, has been described by Manber and Myers in 1990. This space-economical structure has been neglected during more than a decade, its construction being too slow. S...

متن کامل

Linear-time Suffix Sorting - A New Approach for Suffix Array Construction

This thesis presents a new approach for linear-time suffix sorting. It introduces a new sorting principle that can be used to build the first non-recursive linear-time suffix array construction algorithm named GSACA. Although GSACA cannot hold up with the performance of state of the art suffix array construction algorithms, the algorithm introduces a lot of new ideas for suffix array constructi...

متن کامل

Fast and Lightweight LCP-Array Construction Algorithms

The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three (compressed) components: the suffix array, the LCP-array, and data structures for simulating navigational operations on the suffix tree. The LCP-array stores the leng...

متن کامل

Lightweight LCP-Array Construction in Linear Time

The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three (compressed) components: the suffix array, the LCP-array, and data structures for simulating navigational operations on the suffix tree. The LCP-array stores the leng...

متن کامل

An Incomplex Algorithm for Fast Suffix Array Construction

Our aim is to provide full text indexing data structures and algorithms for universal usage in text indexing. We present a practical algorithm for suffix array construction. The fundamental algorithm is less complex than other construction algorithms. We achieve very fast construction times for common strings as well as for worst case strings by enhancing our basic algorithms with further techn...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2014