Testing the performance of automated annotation of ESTs with the Kegg Orthology (KO) database demonstrates lack of completeness of clusters.
نویسندگان
چکیده
The KEGG Orthology (KO) database was tested as a source for automated annotation of expressed sequence tags (ESTs). We used a control experiment where every EST was assigned to its cognate protein, and an annotation experiment where the ESTs were annotated by proteins from other organisms. Analyzing the results, we could assign classes to the annotation: correct, changed and speculated. The correct annotation ranged from 57 (Caenorhabditis elegans) to 81% (Homo sapiens). In spite of the changed annotation being low (1 in H. sapiens to 9% in Arabidopsis thaliana), the speculation was very high (18 in H. sapiens to 38% in C. elegans). We propose eliminating part of the speculated annotation using the KEGG Genes database to enrich KO clusters, decreasing the speculation from 38 to 2% in C. elegans. Thus, the KO database still demands some effort for moving sequences from Kegg GENES to KO, to complement the annotation performance.
منابع مشابه
Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary
MOTIVATION High-throughput technologies such as DNA sequencing and microarrays have created the need for automated annotation of large sets of genes, including whole genomes, and automated identification of pathways. Ontologies, such as the popular Gene Ontology (GO), provide a common controlled vocabulary for these types of automated analysis. Yet, while GO offers tremendous value, it also has...
متن کاملBlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences.
BlastKOALA and GhostKOALA are automatic annotation servers for genome and metagenome sequences, which perform KO (KEGG Orthology) assignments to characterize individual gene functions and reconstruct KEGG pathways, BRITE hierarchies and KEGG modules to infer high-level functions of the organism or the ecosystem. Both servers are made freely available at the KEGG Web site (http://www.kegg.jp/bla...
متن کاملKEGG as a reference resource for gene and protein annotation
KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, ...
متن کاملKOBAS server: a web-based platform for automated annotation and pathway identification
There is an increasing need to automatically annotate a set of genes or proteins (from genome sequencing, DNA microarray analysis or protein 2D gel experiments) using controlled vocabularies and identify the pathways involved, especially the statistically enriched pathways. We have previously demonstrated the KEGG Orthology (KO) as an effective alternative controlled vocabulary and developed a ...
متن کاملKAAS: KEGG Automatic Annotation Server
The number of complete and draft genomes has rapidly increased in recent years, and it has become increasingly important to identify the functional properties and biological roles of genes in these genomes. We have been developing KEGG Orthology (KO) to classify gene functions. In KO, we annotate genes in complete genomes based on best-hit information using Smith-Waterman scores, as well as by ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Genetics and molecular research : GMR
دوره 7 3 شماره
صفحات -
تاریخ انتشار 2008