Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles
نویسندگان
چکیده
Repetitive sequences are biologically and clinically important because they can influence traits and disease, but repeats are challenging to analyse using short-read sequencing technology. We present a tool for genotyping microsatellite repeats called RepeatSeq, which uses Bayesian model selection guided by an empirically derived error model that incorporates sequence and read properties. Next, we apply RepeatSeq to high-coverage genomes from the 1000 Genomes Project to evaluate performance and accuracy. The software uses common formats, such as VCF, for compatibility with existing genome analysis pipelines. Source code and binaries are available at http://github.com/adaptivegenome/repeatseq.
منابع مشابه
popSTR: population-scale detection of STR variants
Motivation Microsatellites, also known as short tandem repeats (STRs), are tracts of repetitive DNA sequences containing motifs ranging from two to six bases. Microsatellites are one of the most abundant type of variation in the human genome, after single nucleotide polymorphisms (SNPs) and Indels. Microsatellite analysis has a wide range of applications, including medical genetics, forensics a...
متن کاملmegasat: automated inference of microsatellite genotypes from sequence data.
megasat is software that enables genotyping of microsatellite loci using next-generation sequencing data. Microsatellites are amplified in large multiplexes, and then sequenced in pooled amplicons. megasat reads sequence files and automatically scores microsatellite genotypes. It uses fuzzy matches to allow for sequencing errors and applies decision rules to account for amplification artefacts,...
متن کاملAssessing genetic diversity of promising wheat (Triticum aestivum L.) lines using microsatellite markers linked with salinity tolerance
Narrow genetic variability may lead to genetic vulnerability of field crops against biotic and abiotic stresses which can cause yield reduction. In this study a set of 37 wheat microsatellite markers linked with identified QTLs for salinity tolerance were used for the assessment of genetic diversity for salinity in 30 promising lines of hexaploid bread wheat (Triticum aestivum L.). A total of 4...
متن کاملMAGERI: Computational pipeline for molecular-barcoded targeted resequencing
Unique molecular identifiers (UMIs) show outstanding performance in targeted high-throughput resequencing, being the most promising approach for the accurate identification of rare variants in complex DNA samples. This approach has application in multiple areas, including cancer diagnostics, thus demanding dedicated software and algorithms. Here we introduce MAGERI, a computational pipeline tha...
متن کاملMaximum-likelihood estimation of allelic dropout and false allele error rates from microsatellite genotypes in the absence of reference data.
The importance of quantifying and accounting for stochastic genotyping errors when analyzing microsatellite data is increasingly being recognized. This awareness is motivating the development of data analysis methods that not only take errors into consideration but also recognize the difference between two distinct classes of error, allelic dropout and false alleles. Currently methods to estima...
متن کامل