The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes
نویسندگان
چکیده
Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR) pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs) in all genomes surveyed. This matrix can be easily parsed in order to identify genetic relationships between bacterial genomes. Although pipelines have been published that group peptides by sequence similarity, no other software performs the rapid, large-scale, full-genome comparative analyses carried out by LS-BSR. Results. To demonstrate the utility of the method, the LS-BSR pipeline was tested on 96 Escherichia coli and Shigella genomes; the pipeline ran in 163 min using 16 processors, which is a greater than 7-fold speedup compared to using a single processor. The BSR values for each CDS, which indicate a relative level of relatedness, were then mapped to each genome on an independent core genome single nucleotide polymorphism (SNP) based phylogeny. Comparisons were then used to identify clade specific CDS markers and validate the LS-BSR pipeline based on molecular markers that delineate between classical E. coli pathogenic variant (pathovar) designations. Scalability tests demonstrated that the LS-BSR pipeline can process 1,000 E. coli genomes in 27-57 h, depending upon the alignment method, using 16 processors. Conclusions. LS-BSR is an open-source, parallel implementation of the BSR algorithm, enabling rapid comparison of the genetic content of large numbers of genomes. The results of the pipeline can be used to identify specific markers between user-defined phylogenetic groups, and to identify the loss and/or acquisition of genetic information between bacterial isolates. Taxa-specific genetic markers can then be translated into clinical diagnostics, or can be used to identify broadly conserved putative therapeutic candidates.
منابع مشابه
Phage_Finder: Automated identification and classification of prophage regions in complete bacterial genome sequences
Phage_Finder, a heuristic computer program, was created to identify prophage regions in completed bacterial genomes. Using a test dataset of 42 bacterial genomes whose prophages have been manually identified, Phage_Finder found 91% of the regions, resulting in 7% false positive and 9% false negative prophages. A search of 302 complete bacterial genomes predicted 403 putative prophage regions, a...
متن کاملGenomic Characterization of Burkholderia pseudomallei Isolates Selected for Medical Countermeasures Testing: Comparative Genomics Associated with Differential Virulence
Burkholderia pseudomallei is the causative agent of melioidosis and a potential bioterrorism agent. In the development of medical countermeasures against B. pseudomallei infection, the US Food and Drug Administration (FDA) animal Rule recommends using well-characterized strains in animal challenge studies. In this study, whole genome sequence data were generated for 6 B. pseudomallei isolates p...
متن کاملRobust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data
The rapidly reducing cost of bacterial genome sequencing has lead to its routine use in large-scale microbial analysis. Though mapping approaches can be used to find differences relative to the reference, many bacteria are subject to constant evolutionary pressures resulting in events such as the loss and gain of mobile genetic elements, horizontal gene transfer through recombination and genomi...
متن کاملmGenomeSubtractor: a web-based tool for parallel in silico subtractive hybridization analysis of multiple bacterial genomes
mGenomeSubtractor performs an mpiBLAST-based comparison of reference bacterial genomes against multiple user-selected genomes for investigation of strain variable accessory regions. With parallel computing architecture, mGenomeSubtractor is able to run rapid BLAST searches of the segmented reference genome against multiple subject genomes at the DNA or amino acid level within a minute. In addit...
متن کاملExperimental Investigation of the Effect of Splitter Plate Angle on the Under-Scouring of Submarine Pipeline Due to Steady Current and Clear Water Condition
Submarine pipelines are appropriate method for transmission of oil and gas from sea bed. Free spans may occur due to the natural uneven seabed or by under-scouring. Vortex Induced Vibration (VIV) can happen in such free spans at high Reynolds number. Resonance occurs if the frequency of vortex shedding is close to the pipeline’s natural frequency leading to its fatigue that can break the pipeli...
متن کامل