A Study of GeneWise with the Drosophila Adh Region

نویسندگان

  • Yi Mo
  • Moira Regelson
  • Mike Sievers
چکیده

GeneWise is one of the most accurate computer programs for gene finding, but unfortunately it is very computationally expensive. Paracel has accelerated GeneWise on its sequence analysis supercomputer, GeneMatcherTM. In this study, the performance and scientific validity of Paracel GeneWise (PGW) were assessed by comparing PGW to software GeneWise (SGW), using the Drosophila Adh region as the benchmark. For an equivalent search, PGW running on GeneMatcher2 achieved a speed 2735 times faster than that of SGW running on a single 700-MHz Pentium III processor, yielding effectively the same results. A search was performed for all Pfam hits in the Adh sequence, comparing PGW to a heuristically accelerated GeneWise (HAGW) approach called "HalfWise." HalfWise uses BLASTX to select potential Pfam hidden Markov models (HMMs) for further analysis with the more computationally expensive GeneWise. The PGW approach had a sensitivity and specificity up to 87% and 88%, respectively, for identifying Pfam HMM hits, compared to 59% and 93% with the HAGW approach. The exceptional speed and proven scientific validity of Paracel GeneWise make it an indispensable tool for annotations in the genomic era. INTRODUCTION With the onslaught of floods of genomic DNA sequence data, including the human genome (Venter et al., 2001; International Human Genome Sequencing Consortium, 2001), the need for computational tools to rapidly and accurately annotate genomes is ever more pressing. Among various software programs for gene finding and genome annotation in large DNA sequences, GeneWise (Birney and Durbin, 2000; http://www.sanger.ac.uk/Software/Wise2) stands out as one of the most accurate (Guigo et al., 2000). GeneWise is a protein-homology based program using hidden Markov models (HMMs) for finding genes in genomic DNA sequences. By incorporating a protein profile-HMM and a model of DNA splice sites, GeneWise finds the best gene structure prediction and, simultaneously, the alignment of the genomic sequence to the protein profile-HMM or protein sequence. However, GeneWise is a very computationally expensive dynamic program, so it is not yet used very widely for largescale genome annotations. In order to make GeneWise a more practical tool in the genomic age, Paracel has implemented and vastly accelerated the algorithm on GeneMatcherTM, a supercomputer for biological sequence analyses. To assess the performance of Paracel GeneWise (PGW) relative to software GeneWise (SGW), and its scientific validity relative to heuristically accelerated GeneWise (HAGW), we have evaluated each approach with a genomic DNA sequence contig of about 2.9 Mb from the Drosophila Adh region. This region has been extensively studied and was used in the Genome Annotation Assessment Project (GASP) (Reese et al., 2000). In this evaluation, we have focused on finding all Pfam (Bateman et al., 2000) protein profile-HMMs that occur in the Adh genomic sequence, a study similar to one done by Birney and Durbin (2000). Pfam is a database of protein profile-HMMs and multiple sequence alignments for protein domains and families. It is widely used for genome annotation because of the functional information that can be inferred from similarities to protein domains and families.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using GeneWise in the Drosophila annotation experiment.

The GeneWise method for combining gene prediction and homology searches was applied to the 2.9-Mb region from Drosophila melanogaster. The results from the Genome Annotation Assessment Project (GASP) showed that GeneWise provided reasonably accurate gene predictions. Further investigation indicates that many of the incorrect gene predictions from GeneWise were due to transposons with valid prot...

متن کامل

Organization and evolution of the alcohol dehydrogenase gene in Drosophila.

The alcohol dehydrogenase (Adh) gene was isolated from Drosophila simulans and D. mauritiana, and the DNA sequence of a 4.6-kb region, containing the structural gene and flanking sequence, was determined for each. These sequences were compared with the Adh region of D. melanogaster to characterize changes that occur in the Drosophila genome during evolution and to identify conserved sequences o...

متن کامل

Nonfixed duplication containing the Adh gene and a truncated form of the Adhr gene in the Drosophila funebris species group: different modes of evolution of Adh relative to Adhr in Drosophila.

The sequence of the genomic region that contains the Adh and Adhr genes of Drosophila funebris was used to demonstrate that both genes are present in species of the funebris group. The sequence of this genomic region reveals a 2.9-kb tandem duplication which encompasses 1.6 kb of the 5' flanking region, the entire Adh gene, and two thirds of the first exon of the Adhr gene in D. funebris. This ...

متن کامل

Characterization of the structure and evolution of the Adh region of Drosophila hydei.

Drosophila of the repleta group have a duplication of the gene which encodes alcohol dehydrogenase (ADH). We report the nucleotide sequence of an 8.4-kb region of genomic DNA of Drosophila hydei which includes the entire Adh region. Analysis of this sequence reveals similarity in organization to the Adh region of Drosophila mojavensis and Drosophila mulleri of the mulleri subgroup, with three g...

متن کامل

Structure and evolution of the Adh genes of Drosophila mojavensis.

The nucleotide sequence of the Adh region of Drosophila mojavensis has been completed and the region found to contain a pseudogene, Adh-2 and Adh-1 arranged in that order. Comparison of the sequence divergence of these genes to one another and to the Adh region of Drosophila mulleri and other species has allowed the development of a model for the evolution of the duplication of the Adh genes. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001