Genome sequence analysis with monetdb: a case study on ebola virus diversity
نویسندگان
چکیده
Next-generation sequencing (NGS) technology has led the life sciences into the big data era. Today, sequencing genomes takes little time and cost, but yields terabytes of data to be stored and analyzed. Biologists are often exposed to excessively time consuming and error-prone data management and analysis hurdles. In this paper, we propose a database management system (DBMS) based approach to accelerate and substantially simplify genome sequence analysis. We have extended MonetDB, an open-source column-based DBMS, with a BAM module, which enables easy, flexible, and rapid management and analysis of sequence alignment data stored as Sequence Alignment/Map (SAM/BAM) files. We describe the main features of MonetDB/BAM using a case study on Ebola virus genomes.
منابع مشابه
Phylogenetic analysis and genetic variation of Tomato yellow leaf curl virus based on the V1 gene in Iraq
Tomato yellow leaf curl virus (TYLCV) is a supreme pathogen in tropical and subtropical areas. During 2014-2015, a total of 393 tomato samples showing Tomato yellow leaf curl disease (TYLCD) symptoms were collected from six different provinces of Iraq. In serological assays, 55 out of 393 samples (14%) reacted positively with TYLCV-specific antibodies .The presence of TYLCV was verified in 21 (...
متن کاملThe effect of temperature on the binding affinity of Remdesivir and RdRp enzyme of SARS-COV-2 virus using steered molecular dynamics simulation
The fatal SARS-COV-2 virus appeared in China at the end of 2019 for the first time. This virus has similar sequence with SARS-COV in 2002, but its infection is very high rate. On the other hand, SARS-COV-2 is a RNA virus and requires RNA-dependent RNA polymerase (RdRp) to transcribe its viral genome. Due to the availability of the active site of this enzyme, an effective treatment is targeting ...
متن کاملComparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species
Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...
متن کاملGenome sequence analysis of Ebola virus in clinical samples from three British healthcare workers, August 2014 to March 2015.
We determined complete viral genome sequences from three British healthcare workers infected with Ebola virus (EBOV) in Sierra Leone, directly from clinical samples. These sequences closely resemble those previously observed in the current Ebola virus disease outbreak in West Africa, with glycoprotein and polymerase genes showing the most sequence variation. Our data indicate that current PCR d...
متن کاملComplete Genomic Sequence of a Strain of Tomato Yellow Leaf Curl Virus from Iran
Background and Aims: Tomato yellow leaf curl virus (TYLCV) is one of the most destructive viruses of tomato that leads to reduced tomato yield up to 100% in tropical and subtropical regions. In this study, the complete sequence of TYLCV isolate from Hormozgan province, Iran and its recombination evsent was determined. Methods: TYLCV infected tomato was collected from Hormozgan province. Total D...
متن کامل