Detailed MATE-CLEVER Pipeline for GoNL
نویسندگان
چکیده
For deletion discovery, we ran the discovery part of MATE-CLEVER [3], with minor modifications that account for volatilities among library protocols. MATE-CLEVER is an integrated approach. Its major purpose in the frame of the project is to discover deletions of size 30–100 bp (sometimes termed the ”twilight zone of NGS indels”). It incorporates CLEVER [2], as an internal segment size based approach that approvedly has state-of-the-art performance rates on indels of size 30–100 bp, and LASER [4], as a split-read aligner. MATE-CLEVER uses CLEVER in a first step, in order to discover deletions of size 30–100 bp at extremely high sensitivity, and uses LASER in a second step, in order to refine the breakpoint annotations made by CLEVER. It also uses several auxiliary tools, as described below. In the following, we provide the full description of details and commands, by which to reproduce MATE-CLEVER’s callset. This pipeline deviates in several details from the description provided in [3] itself, due to the minor modifications mentioned above. To run the CLEVER-based deletion discovery pipeline, revision 3097f2 from the git repository at http://clever-sv.googlecode.com has been used. As above-mentioned, it includes CLEVER [2], LASER [4], and several auxiliary tools described below. To run (and parallelize) the below pipeline, we used the Python-based workflow engine Snakemake [1].
منابع مشابه
MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels
MOTIVATION Accurately predicting and genotyping indels longer than 30 bp has remained a central challenge in next-generation sequencing (NGS) studies. While indels of up to 30 bp are reliably processed by standard read aligners and the Genome Analysis Toolkit (GATK), longer indels have still resisted proper treatment. Also, discovering and genotyping longer indels has become particularly releva...
متن کاملDetailed Scheduling of Tree-like Pipeline Networks with Multiple Refineries
In the oil supply chain, the refined petroleum products are transported by various transportation modes, such as rail, road, vessel and pipeline. The latter provides one of the safest and cheapest ways to connect production areas to local markets. This paper addresses the operational scheduling of a multi-product tree-like pipeline connecting several refineries to multiple distribution centers ...
متن کاملX-MATE: a flexible system for mapping short read data
SUMMARY Accurate and complete mapping of short-read sequencing to a reference genome greatly enhances the discovery of biological results and improves statistical predictions. We recently presented RNA-MATE, a pipeline for the recursive mapping of RNA-Seq datasets. With the rapid increase in genome re-sequencing projects, progression of available mapping software and the evolution of file forma...
متن کاملRNA-MATE: a recursive mapping strategy for high-throughput RNA-sequencing data
UNLABELLED Mapping of next-generation sequencing data derived from RNA samples (RNAseq) presents different genome mapping challenges than data derived from DNA. For example, tags that cross exon-junction boundaries will often not map to a reference genome, and the strand specificity of the data needs to be retained. Here we present RNA-MATE, a computational pipeline based on a recursive mapping...
متن کاملIntegrated Analysis of Whole-Genome Paired-End and Mate-Pair Sequencing Data for Identifying Genomic Structural Variations in Multiple Myeloma
We present a pipeline to perform integrative analysis of mate-pair (MP) and paired-end (PE) genomic DNA sequencing data. Our pipeline detects structural variations (SVs) by taking aligned sequencing read pairs as input and classifying these reads into properly paired and discordantly paired categories based on their orientation and inferred insert sizes. Recurrent SV was identified from the dis...
متن کامل