Functional annotation of the Arabidopsis genome using controlled vocabularies.
نویسندگان
چکیده
Controlled vocabularies are increasingly used by databases to describe genes and gene products because they facilitate identification of similar genes within an organism or among different organisms. One of The Arabidopsis Information Resource's goals is to associate all Arabidopsis genes with terms developed by the Gene Ontology Consortium that describe the molecular function, biological process, and subcellular location of a gene product. We have also developed terms describing Arabidopsis anatomy and developmental stages and use these to annotate published gene expression data. As of March 2004, we used computational and manual annotation methods to make 85,666 annotations representing 26,624 unique loci. We focus on associating genes to controlled vocabulary terms based on experimental data from the literature and use The Arabidopsis Information Resource-developed PubSearch software to facilitate this process. Each annotation is tagged with a combination of evidence codes, evidence descriptions, and references that provide a robust means to assess data quality. Annotation of all Arabidopsis genes will allow quantitative comparisons between sets of genes derived from sources such as microarray experiments. The Arabidopsis annotation data will also facilitate annotation of newly sequenced plant genomes by using sequence similarity to transfer annotations to homologous genes. In addition, complete and up-to-date annotations will make unknown genes easy to identify and target for experimentation. Here, we describe the process of Arabidopsis functional annotation using a variety of data sources and illustrate several ways in which this information can be accessed and used to infer knowledge about Arabidopsis and other plant species.
منابع مشابه
Genome Analysis Functional Annotation of the Arabidopsis Genome Using Controlled Vocabularies
Controlled vocabularies are increasingly used by databases to describe genes and gene products because they facilitate identification of similar genes within an organism or among different organisms. One of The Arabidopsis Information Resource’s goals is to associate all Arabidopsis genes with terms developed by the Gene Ontology Consortium that describe the molecular function, biological proce...
متن کاملPlant Ontology (PO): a Controlled Vocabulary of Plant Structures and Growth Stages
The Plant Ontology Consortium (POC) (www.plantontology.org) is a collaborative effort among several plant databases and experts in plant systematics, botany and genomics. A primary goal of the POC is to develop simple yet robust and extensible controlled vocabularies that accurately reflect the biology of plant structures and developmental stages. These provide a network of vocabularies linked ...
متن کاملComputational Protein Function Prediction: Framework and Challenges
Large scale genome sequencing technologies are increasing the abundance of experimental data which requires functional characterization. There is a continually widening gap between the mounting numbers of available genomes and completeness of their annotations, which makes it impractical to manually curate the genomes for function information. To handle this growing challenge we need computatio...
متن کاملArabidopsis leaf plasma membrane proteome using a gel free method: Focus on receptor–like kinases
The hydrophobic proteins of plant plasma membrane still remain largely unknown. For example in the Arabidopsis genome, receptor-like kinases (RLKs) are plasma membrane proteins, functioning as the primary receptors in the signaling of stress conditions, hormones and the presence of pathogens form a diverse family of over 610 genes. A limited number of these proteins have appeard in pr...
متن کاملThe Arabidopsis Information Resource (TAIR): improved gene annotation and new tools
The Arabidopsis Information Resource (TAIR, http://arabidopsis.org) is a genome database for Arabidopsis thaliana, an important reference organism for many fundamental aspects of biology as well as basic and applied plant biology research. TAIR serves as a central access point for Arabidopsis data, annotates gene function and expression patterns using controlled vocabulary terms, and maintains ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Plant physiology
دوره 135 2 شماره
صفحات -
تاریخ انتشار 2004