Sequence-specific reconstruction from fragmentary databases using seed sequences: implementation and validation on SAGE, proteome and generic sequencing data
نویسندگان
چکیده
MOTIVATION DNA assembly programs classically perform an all-against-all comparison of reads to identify overlaps, followed by a multiple sequence alignment and generation of a consensus sequence. If the aim is to assemble a particular segment, instead of a whole genome or transcriptome, a target-specific assembly is a more sensible approach. GenSeed is a Perl program that implements a seed-driven recursive assembly consisting of cycles comprising a similarity search, read selection and assembly. The iterative process results in a progressive extension of the original seed sequence. GenSeed was tested and validated on many applications, including the reconstruction of nuclear genes or segments, full-length transcripts, and extrachromosomal genomes. The robustness of the method was confirmed through the use of a variety of DNA and protein seeds, including short sequences derived from SAGE and proteome projects. AVAILABILITY GenSeed is available under the GNU General Public License at http://www.coccidia.icb.usp.br/genseed/
منابع مشابه
Isolation and identification of Eurotium species from contaminated rice by morphology and DNA sequencing
30 milled rice samples were collected from retailers in four states of Malaysia. These samples were evaluated for Eurotium spp. contaminations by direct plating on malt extract salt agar (MESA). All Eurotium were isolated and identified based on morphology and nucleotide sequences of internal transcribed spacer 1 (ITS1) and ITS2 of the rDNA. Four Eurotium species (E. rubrum, E. amstelodami, E....
متن کاملP-215: Discovery of A Novel APA Variant of A Human Potential Gene Based on Expressed Sequenced Tags Analysis
Background: Expressed sequence tags (ESTs) are sequences of cDNA fragments prepared from different tissue sources. There are over one million of these sequences in the publicly available database, and these sequences are believed to represent more than half of all human genes. The ESTs belong to different cDNA libraries, was prepared from one particular cell type, organ, or tumor. Therefore, th...
متن کاملBIR Pipeline for Preparation of Phylogenomic Data
SUMMARY We present a pipeline named BIR (Blast, Identify and Realign) developed for phylogenomic analyses. BIR is intended for the identification of gene sequences applicable for phylogenomic inference. The pipeline allows users to apply their own manually curated sequence alignments (seed) in search for homologous genes in sequence databases and available genomes. BIR automatically adds the id...
متن کاملSequencing and Bioinformatics Analysis of Kappa-Casein Exon 4 Gene in Iranian Bacterianus and Dromedaries Camels
Kappa-casein, as a major protein component in mammalian milk, plays an essential role in formation and stabilization milk micelles and preventing them from aggregating and therefore, helping to keep calcium phosphate in solution and transfer of calcium and phosphors from animal milk to consumers. Therefore, the objective of the current study was to investigate genetic and phylogenetic analysis ...
متن کاملAn Evolutionary and Phylogenetic Study of the BMP15 Gene
DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 24 15 شماره
صفحات -
تاریخ انتشار 2008