The Protein Naming Utility: a rules database for protein nomenclature
نویسندگان
چکیده
Generation of syntactically correct and unambiguous names for proteins is a challenging, yet vital task for functional annotation processes. Proteins are often named based on homology to known proteins, many of which have problematic names. To address the need to generate high-quality protein names, and capture our significant experience correcting protein names manually, we have developed the Protein Naming Utility (PNU, http://www.jcvi.org/pn-utility). The PNU is a web-based database for storing and applying naming rules to identify and correct syntactically incorrect protein names, or to replace synonyms with their preferred name. The PNU allows users to generate and manage collections of naming rules, optionally building upon the growing body of rules generated at the J. Craig Venter Institute (JCVI). Since communities often enforce disparate conventions for naming proteins, the PNU supports grouping rules into user-managed collections. Users can check their protein names against a selected PNU rule collection, generating both statistics and corrected names. The PNU can also be used to correct GenBank table files prior to submission to GenBank. Currently, the database features 3080 manual rules that have been entered by JCVI Bioinformatics Analysts as well as 7458 automatically imported names.
منابع مشابه
GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION
This paper considers the generation of some interpretable fuzzy rules for assigning an amino acid sequence into the appropriate protein superfamily. Since the main objective of this classifier is the interpretability of rules, we have used the distribution of amino acids in the sequences of proteins as features. These features are the occurrence probabilities of six exchange groups in the seque...
متن کاملThe Role of Iranian Culture and Brocade (Zarbaft) in Nomenclature of the Silk Road
The term "Silk Road" was first used in 1876 AD (1292 AH. / 1254 ASH.) by a German geographer and tourist during his travel to China. Richthofen chose this name for the vast network of roads connecting Asia and Europe, from the China Sea to Central and Western Asia, especially the Iranian plateau, and Anatolia to the Mediterranean coast. This nomenclature was influenced by several circumstances ...
متن کاملAutomated alignment and nomenclature for consistent treatment of polymorphisms in the human mitochondrial DNA control region.
Naming mtDNA sequences by listing only those sites that differ from a reference sequence is the standard practice for describing the observed variations. Consistency in nomenclature is desirable so that all sequences in a database that are concordant with an evidentiary sequence will be found for estimating the rarity of that profile. The operational alignment and nomenclature rules, i.e., "Wil...
متن کاملPhylogeny as the basis for naming histones.
Thirty years ago, most proteins were still discovered by protein sequencing, whereas in the genomic era, most proteins are now discovered by conceptually translating DNA sequences, and many are found to be members of protein families with orthologs and paralogs in multiple organisms. The naming of members of large protein families can rapidly become haphazard or contradictory; therefore, nomenc...
متن کاملRenaming genes and duplication of gene names in the literature.
The November 2001 issue of The Plant Cell includes a letter to the editor from Sheng Luan and colleagues regarding the renaming of genes and duplication of gene names in the literature. Plant Physiology joins with The Plant Cell in recognizing gene nomenclature as an important issue and fully supports adherence to convention for naming genes. The policy of Plant Physiology will also be changed ...
متن کامل