A robust model for read count data in exome sequencing experiments and implications for copy number variant calling
نویسندگان
چکیده
MOTIVATION Exome sequencing has proven to be an effective tool to discover the genetic basis of Mendelian disorders. It is well established that copy number variants (CNVs) contribute to the etiology of these disorders. However, calling CNVs from exome sequence data is challenging. A typical read depth strategy consists of using another sample (or a combination of samples) as a reference to control for the variability at the capture and sequencing steps. However, technical variability between samples complicates the analysis and can create spurious CNV calls. RESULTS Here, we introduce ExomeDepth, a new CNV calling algorithm designed to control for this technical variability. ExomeDepth uses a robust model for the read count data and uses this model to build an optimized reference set in order to maximize the power to detect CNVs. As a result, ExomeDepth is effective across a wider range of exome datasets than the previously existing tools, even for small (e.g. one to two exons) and heterozygous deletions. We used this new approach to analyse exome data from 24 patients with primary immunodeficiencies. Depending on data quality and the exact target region, we find between 170 and 250 exonic CNV calls per sample. Our analysis identified two novel causative deletions in the genes GATA2 and DOCK8. AVAILABILITY The code used in this analysis has been implemented into an R package called ExomeDepth and is available at the Comprehensive R Archive Network (CRAN).
منابع مشابه
Genome analysis CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data
Motivation: Several algorithms exist for detecting copy number variants (CNVs) from human exome sequencing read depth, but previous tools have not been well suited for large population studies on the order of tens or hundreds of thousands of exomes. Their limitations include being difficult to integrate into automated variant-calling pipelines and being ill-suited for detecting common variants....
متن کاملCLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data
MOTIVATION Several algorithms exist for detecting copy number variants (CNVs) from human exome sequencing read depth, but previous tools have not been well suited for large population studies on the order of tens or hundreds of thousands of exomes. Their limitations include being difficult to integrate into automated variant-calling pipelines and being ill-suited for detecting common variants. ...
متن کاملEvaluation of Hybridization Capture Versus Amplicon‐Based Methods for Whole‐Exome Sequencing
Next-generation sequencing has aided characterization of genomic variation. While whole-genome sequencing may capture all possible mutations, whole-exome sequencing remains cost-effective and captures most phenotype-altering mutations. Initial strategies for exome enrichment utilized a hybridization-based capture approach. Recently, amplicon-based methods were designed to simplify preparation a...
متن کاملCanvas: versatile and scalable detection of copy number variants
MOTIVATION Versatile and efficient variant calling tools are needed to analyze large scale sequencing datasets. In particular, identification of copy number changes remains a challenging task due to their complexity, susceptibility to sequencing biases, variation in coverage data and dependence on genome-wide sample properties, such as tumor polyploidy or polyclonality in cancer samples. RESU...
متن کاملModeling read counts for CNV detection in exome sequencing data.
Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov mode...
متن کامل