AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework
نویسندگان
چکیده
Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost's algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost.
منابع مشابه
Comparison of two QTL mapping approaches based on Bayesian inference using high-dense SNPs markers
To compare different QTL mapping methods, a population with genotypic and phenotypic data was simulated. In Bayesian approach, all information of markers can be used along with combination of distributions of SNP markers. It is assumed that most of the markers (95%) have minor effects and a few numbers of markers (5%) exert major effects. The simulated population included a basic population of ...
متن کاملAssessment of Neonate's Congenital Hypothyroidism Pattern Using Poisson Spatio-temporal Model in Disease Mapping under the Bayesian Paradigm during 2011-18 in Guilan, Iran
Background: Congenital Hypothyroidism (CH) is one of the reasons for mental retardation and defective growth in neonates. It can be treated if it is diagnosed early. The congenital hypothyroidism can be diagnosed using newborn screening in the first days after birth. Disease mapping helps to identify high-risk areas of the disease. This study aimed to evaluate the pattern of CH using the Poisso...
متن کاملPredictive Risk Mapping of Leptospirosis for North of Iran Using Pseudo-absences Data
Leptospirosis is a common zoonosis disease with a high prevalence in the world and is recognized as an important public health drawback in both developing and developed countries owing to epidemics and increasing prevalence. Because of the high diversity of hosts that are capable of carrying the causative agent, this disease has an expansive geographical reach. Various environmental and social ...
متن کاملپهنهبندی خطر زمینلغزش با استفاده از تئوری بیزین
The aim of present research is landslide hazard zoning using Bayesian theory in a part of Golestan province. For this purpose, landslides inventory map was created by landslide locations of landslide database (392 landslide locations). Then, the maps of effective parameters in landslide such as slope degree, aspect, altitude, slope curvature, geology, land use, distance of drainage, distance of...
متن کاملAccurate estimation of short read mapping quality for next-generation genome sequencing
MOTIVATION Several software tools specialize in the alignment of short next-generation sequencing reads to a reference sequence. Some of these tools report a mapping quality score for each alignment-in principle, this quality score tells researchers the likelihood that the alignment is correct. However, the reported mapping quality often correlates weakly with actual accuracy and the qualities ...
متن کامل