An adaptive decorrelation method removes Illumina DNA base-calling errors caused by crosstalk between adjacent clusters

نویسندگان

  • Bo Wang
  • Lin Wan
  • Anqi Wang
  • Lei M. Li
چکیده

Base-calling accuracy is crucial for high-throughput DNA sequencing and downstream analysis such as read mapping and genome assembly. Accordingly, we made an endeavor to reduce DNA sequencing errors of Illumina systems by correcting three kinds of crosstalk in the cluster intensity data. We discovered that signal crosstalk between adjacent clusters accounts for a large portion of sequencing errors in Illumina systems, even after correcting color crosstalk caused by the overlap of dye emission spectra and phasing/pre-phasing caused by out-of-step nucleotide synthesis. Interestingly and importantly, spatial crosstalk between adjacent clusters is cluster-specific and often asymmetric, which cannot be corrected by existing deconvolution methods. Therefore, we introduce a novel mathematical method able to estimate and remove spatial crosstalk, thereby reducing base-calling errors by 44-69% at a given mapping rate from Illumina systems. Furthermore, the resolution gained from this work provides new room for higher throughput of DNA sequencing and of general measurement systems using fluorescence-based imaging technology. The resulting base-caller 3Dec is available for academic users at http://github.com/flishwnag/3dec. Not only does it reduce 62.1% errors compared to the standard pipeline, but also its implementation is fast enough for daily sequencing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing

Amplicon sequencing of tags such as 16S and ITS ribosomal RNA is a popular method for investigating microbial populations. In such experiments, sequence errors caused by PCR and sequencing are difficult to distinguish from true biological variation. I describe UNOISE2, an updated version of the UNOISE algorithm for denoising (error-correcting) Illumina amplicon reads and show that it has compar...

متن کامل

Targeted Sequencing of 179 Genes Associated with Hereditary Retinal Dystrophies and 10 Candidate Genes Identifies Novel and Known Mutations in Patients with Various Retinal Diseases

Supplemental Method Targeted sequence capture and NGS 6 μg of genomic DNA was randomly fragmented with sizes mainly distributed between 250 and 300 bp. Adapters were ligated to both ends of the resulting fragments. DNA was then amplified by ligation-mediated PCR (LM-PCR), purified, and hybridized to the RDs189 array for enrichment, and non-hybridized fragments were then washed out. Both non-cap...

متن کامل

SNP detection for massively parallel whole-genome resequencing.

Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection m...

متن کامل

An Efficient Approach in Analysis of DNA Base Calling Using Neural Fuzzy Model

This paper presented the issues of true representation and a reliable measure for analyzing the DNA base calling is provided. The method implemented dealt with the data set quality in analyzing DNA sequencing, it is investigating solution of the problem of using Neurofuzzy techniques for predicting the confidence value for each base in DNA base calling regarding collecting the data for each bas...

متن کامل

VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering

MOTIVATION In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2017