Gene Ontology Annotation Using Word Proximity Relationship
نویسندگان
چکیده
In this paper, we propose an approach for doing Gene Ontology (GO) annotation on full-text biomedical articles. This system explores the word proximity relationship between genes and GO terms. We associate genes and GO terms by considering the density function between gene-GO pairs in a paragraph. Different density models are built and several evaluation criteria are employed to assess the effects of the proposed methods. In the best case, we got a precision of < 88% and a recall of < 12%.
منابع مشابه
A Random Forest proximity matrix as a new measure for gene annotation
In this paper we present a new score for gene annotation. This new score is based on the proximity matrix obtained from a trained Random Forest (RF) model. As an example application, we built this model using the association pvalues of genotype with blood phenotype as input and the association of genotype data with coronary heart disease as output. This new score has been validated by comparing...
متن کاملGene ontology annotation by density and gravitation models.
Gene Ontology (GO) is developed to provide standard vocabularies of gene products in different databases. The process of annotating GO terms to genes requires curators to read through lengthy articles. Methods for speeding up or automating the annotation process are thus of great importance. We propose a GO annotation approach using full-text biomedical documents for directing more relevant pap...
متن کاملDefining functional distance using manifold embeddings of gene ontology annotations.
Although rigorous measures of similarity for sequence and structure are now well established, the problem of defining functional relationships has been particularly daunting. Here, we present several manifold embedding techniques to compute distances between Gene Ontology (GO) functional annotations and consequently estimate functional distances between protein domains. To evaluate accuracy, we...
متن کاملCombining Evidence, Specificity, and Proximity towards the Normalization of Gene Ontology Terms in Text
Structured information provided by manual annotation of proteins with Gene Ontology concepts represents a high-quality reliable data source for the research community. However, a limited scope of proteins is annotated due to the amount of human resources required to fully annotate each individual gene product from the literature. We introduce a novel method for automatic identification of GO te...
متن کاملIdentifying informative subsets of the Gene Ontology with information bottleneck methods
MOTIVATION The Gene Ontology (GO) is a controlled vocabulary designed to represent the biological concepts pertaining to gene products. This study investigates the methods for identifying informative subsets of GO terms in an automatic and objective fashion. This task in turn requires addressing the following issues: how to represent the semantic context of GO terms, what metrics are suitable f...
متن کامل