Sequence-specific reconstruction from fragmentary databases using seed sequences: implementation and validation on SAGE, proteome and generic sequencing data

نویسندگان

  • Tiago José P. Sobreira
  • Arthur Gruber
چکیده

MOTIVATION DNA assembly programs classically perform an all-against-all comparison of reads to identify overlaps, followed by a multiple sequence alignment and generation of a consensus sequence. If the aim is to assemble a particular segment, instead of a whole genome or transcriptome, a target-specific assembly is a more sensible approach. GenSeed is a Perl program that implements a seed-driven recursive assembly consisting of cycles comprising a similarity search, read selection and assembly. The iterative process results in a progressive extension of the original seed sequence. GenSeed was tested and validated on many applications, including the reconstruction of nuclear genes or segments, full-length transcripts, and extrachromosomal genomes. The robustness of the method was confirmed through the use of a variety of DNA and protein seeds, including short sequences derived from SAGE and proteome projects. AVAILABILITY GenSeed is available under the GNU General Public License at http://www.coccidia.icb.usp.br/genseed/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Isolation and identification of Eurotium species from contaminated rice by morphology and DNA sequencing

30 milled rice samples were collected from retailers in four states of Malaysia. These samples were evaluated for Eurotium spp. contaminations by direct plating on malt extract salt agar (MESA). All Eurotium were isolated and identified based on morphology and nucleotide sequences of internal transcribed spacer 1 (ITS1) and ITS2 of the rDNA.  Four Eurotium species (E. rubrum, E. amstelodami, E....

متن کامل

P-215: Discovery of A Novel APA Variant of A Human Potential Gene Based on Expressed Sequenced Tags Analysis

Background: Expressed sequence tags (ESTs) are sequences of cDNA fragments prepared from different tissue sources. There are over one million of these sequences in the publicly available database, and these sequences are believed to represent more than half of all human genes. The ESTs belong to different cDNA libraries, was prepared from one particular cell type, organ, or tumor. Therefore, th...

متن کامل

BIR Pipeline for Preparation of Phylogenomic Data

SUMMARY We present a pipeline named BIR (Blast, Identify and Realign) developed for phylogenomic analyses. BIR is intended for the identification of gene sequences applicable for phylogenomic inference. The pipeline allows users to apply their own manually curated sequence alignments (seed) in search for homologous genes in sequence databases and available genomes. BIR automatically adds the id...

متن کامل

Sequencing and Bioinformatics Analysis of Kappa-Casein Exon 4 Gene in Iranian Bacterianus and Dromedaries Camels

Kappa-casein, as a major protein component in mammalian milk, plays an essential role in formation and stabilization milk micelles and preventing them from aggregating and therefore, helping to keep calcium phosphate in solution and transfer of calcium and phosphors from animal milk to consumers. Therefore, the objective of the current study was to investigate genetic and phylogenetic analysis ...

متن کامل

An Evolutionary and Phylogenetic Study of the BMP15 Gene

DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 24 15  شماره 

صفحات  -

تاریخ انتشار 2008