Correcting bias from stochastic insert size in read pair data — applications to structural variation detection and genome assembly
نویسندگان
چکیده
1KTH Royal Institute of Technology, Science for Life Laboratory, School of Computer Science and Communication, Stockholm, Sweden. 2Atherosclerosis Research Unit, Department of Medicine, Karolinska Institutet, Stockholm, Sweden. 3Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden. 4Swedish e-Science Research Centre (SeRC), Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, Sweden. Contact [email protected]
منابع مشابه
Detection and characterization of novel sequence insertions using paired-end next-generation sequencing
MOTIVATION In the past few years, human genome structural variation discovery has enjoyed increased attention from the genomics research community. Many studies were published to characterize short insertions, deletions, duplications and inversions, and associate copy number variants (CNVs) with disease. Detection of new sequence insertions requires sequence data, however, the 'detectable' sequ...
متن کاملSVM2: an improved paired-end-based tool for the detection of small genomic structural variations using high-throughput single-genome resequencing data
Several bioinformatics methods have been proposed for the detection and characterization of genomic structural variation (SV) from ultra high-throughput genome resequencing data. Recent surveys show that comprehensive detection of SV events of different types between an individual resequenced genome and a reference sequence is best achieved through the combination of methods based on different ...
متن کاملImproved gap size estimation for scaffolding algorithms
MOTIVATION One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subse...
متن کاملStructural Variation Detection with Read Pair Information - An Improved Null-Hypothesis Reduces Bias
Reads from paired-end and mate-pair libraries are often utilized to find structural variation in genomes, and one common approach is to use their fragment length for detection. After aligning read pairs to the reference, read pair distances are analyzed for statistically significant deviations. However, previously proposed methods are based on a simplified model of observed fragment lengths tha...
متن کاملEPGA: de novo assembly using the distributions of reads and insert size
MOTIVATION In genome assembly, the primary issue is how to determine upstream and downstream sequence regions of sequence seeds for constructing long contigs or scaffolds. When extending one sequence seed, repetitive regions in the genome always cause multiple feasible extension candidates which increase the difficulty of genome assembly. The universally accepted solution is choosing one based ...
متن کامل