Fundamental Bounds and Approaches to Sequence Reconstruction from Nanopore Sequencers
نویسندگان
چکیده
Motivation: Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (InDel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given InDel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) what is the number of ‘typical’ sequences within the distortion bound induced by InDel errors; (iii) using repeated extrusion through the nanopore, what is the number of repetitions needed to reduce the distortion bound so that only one typical sequence exists within the distortion bound. Results: Our results provide a number of important insights: (i) the maximum length of a sequence that can be accurately reconstructed in the presence of InDel errors is relatively small; (ii) the number of typical sequences within the distortion bound is large; and (iii) repeated extrusion is an effective technique for unique reconstruction. In particular, we show that the number of repeats is a slow function (logarithmic) of sequence length – implying that through repeated extrusion, we can sequence large reads using nanopore sequencers. InDel errors are the primary error mode for nanopore sequencers. To this end, the results in this paper can be viewed as (tight) bounds on reconstruction lengths and repetitions for accurate reconstruction. Contact: [email protected]
منابع مشابه
Nanopore-based Fourth-generation DNA Sequencing Technology
Nanopore-based sequencers, as the fourth-generation DNA sequencing technology, have the potential to quickly and reliably sequence the entire human genome for less than $1000, and possibly for even less than $100. The single-molecule techniques used by this technology allow us to further study the interaction between DNA and protein, as well as between protein and protein. Nanopore analysis ope...
متن کاملNanopore-CMOS Interfaces for DNA Sequencing
DNA sequencers based on nanopore sensors present an opportunity for a significant break from the template-based incumbents of the last forty years. Key advantages ushered by nanopore technology include a simplified chemistry and the ability to interface to CMOS technology. The latter opportunity offers substantial promise for improvement in sequencing speed, size and cost. This paper reviews ex...
متن کاملEnrichment by hybridisation of long DNA fragments for Nanopore sequencing
Enrichment of DNA by hybridisation is an important tool which enables users to gather target-focused next-generation sequence data in an economical fashion. Current in-solution methods capture short fragments of around 200-300 nt, potentially missing key structural information such as recombination or translocations often found in viral or bacterial pathogens. The increasing use of long-read th...
متن کاملAssessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing
Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately character...
متن کاملTraining alignment parameters for arbitrary sequencers with LAST-TRAIN
Summary LAST-TRAIN improves sequence alignment accuracy by inferring substitution and gap scores that fit the frequencies of substitutions, insertions, and deletions in a given dataset. We have applied it to mapping DNA reads from IonTorrent and PacBio RS, and we show that it reduces reference bias for Oxford Nanopore reads. Availability and Implementation the source code is freely available ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1601.02420 شماره
صفحات -
تاریخ انتشار 2015