Designing Efficient Spaced Seeds for SOLiD Read Mapping

نویسندگان

  • Laurent Noé
  • Marta Gîrdea
  • Gregory Kucherov
چکیده

The advent of high-throughput sequencing technologies constituted a major advance in genomic studies, offering new prospects in a wide range of applications.We propose a rigorous and flexible algorithmic solution to mapping SOLiD color-space reads to a reference genome. The solution relies on an advanced method of seed design that uses a faithful probabilistic model of read matches and, on the other hand, a novel seeding principle especially adapted to read mapping. Our method can handle both lossy and lossless frameworks and is able to distinguish, at the level of seed design, between SNPs and reading errors. We illustrate our approach by several seed designs and demonstrate their efficiency.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Seed-Set Construction by Equi-entropy Partitioning for Efficient and Sensitive Short-Read Mapping

Spaced seeds have been shown to be superior to continuous seeds for efficient and sensitive homology search based on the seedand-extend paradigm. Much the same is true in genome mapping of high-throughput short-read data. However, a highly sensitive search with multiple spaced patterns often requires the use of a great amount of index data. We propose a novel seed-set construction method for ef...

متن کامل

PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds

MOTIVATION The explosion of next-generation sequencing data has spawned the design of new algorithms and software tools to provide efficient mapping for different read lengths and sequencing technologies. In particular, ABI's sequencer (SOLiD system) poses a big computational challenge with its capacity to produce very large amounts of data, and its unique strategy of encoding sequence data int...

متن کامل

Sensitivity analysis and efficient method for identifying optimal spaced seeds

The novel introduction of spaced seed idea in the filtration stage of sequence comparison by Ma et al. (Bioinformatics 18 (2002) 440) has greatly increased the sensitivity of homology search without compromising the speed of search. Finding the optimal spaced seeds is of great importance both theoretically and in designing better search tool for sequence comparison. In this paper, we study the ...

متن کامل

SpEED: fast computation of sensitive spaced seeds

SUMMARY Multiple spaced seeds represent the current state-of-the-art for similarity search in bioinformatics, with applications in various areas such as sequence alignment, read mapping, oligonucleotide design, etc. We present SpEED, a software program that computes highly sensitive multiple spaced seeds. SpEED can be several orders of magnitude faster and computes better seeds than the existin...

متن کامل

WALT: fast and accurate read mapping for bisulfite sequencing

Whole-genome bisulfite sequencing (WGBS) has emerged as the gold-standard technique in genome-scale studies of DNA methylation. Mapping reads from WGBS requires unique considerations that make the process more time-consuming than in other sequencing applications. Typical WGBS data sets contain several hundred million reads, adding to this analysis challenge. We present the WALT tool for mapping...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2010  شماره 

صفحات  -

تاریخ انتشار 2010