Correlating Information Contents of Gene Ontology Terms to Infer Semantic Similarity of Gene Products

نویسنده

  • Mingxin Gan
چکیده

Successful applications of the gene ontology to the inference of functional relationships between gene products in recent years have raised the need for computational methods to automatically calculate semantic similarity between gene products based on semantic similarity of gene ontology terms. Nevertheless, existing methods, though having been widely used in a variety of applications, may significantly overestimate semantic similarity between genes that are actually not functionally related, thereby yielding misleading results in applications. To overcome this limitation, we propose to represent a gene product as a vector that is composed of information contents of gene ontology terms annotated for the gene product, and we suggest calculating similarity between two gene products as the relatedness of their corresponding vectors using three measures: Pearson's correlation coefficient, cosine similarity, and the Jaccard index. We focus on the biological process domain of the gene ontology and annotations of yeast proteins to study the effectiveness of the proposed measures. Results show that semantic similarity scores calculated using the proposed measures are more consistent with known biological knowledge than those derived using a list of existing methods, suggesting the effectiveness of our method in characterizing functional relationships between gene products.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Semantic Similarity for Proteins based on the Gene Ontology

One of the current challenges in the Life Sciences is to extract the knowledge contained in the vast amount of data that the genomic and post-genomic techniques are producing. One of the major efforts in this area was the development of the Gene Ontology (GO), a BioOntology that contains terms that describe gene products, organized in a graph structure. Gene products annotated with ontology ter...

متن کامل

Gene Ontology-based Semantic Similarity Measures

Quantitative measure of functional similarity between gene products is important for post-genomics study. The similarity measures may be used to validate high-throughput protein interaction data, help the development of new pathway modelling tools and clustering methods, and enable the identification of functionally related gene products independent of homology [Guo et al., 2006, Schlicker et a...

متن کامل

Semantic Similarity Definition over Gene Ontology by Further Mining of the Information Content

The similarity of two gene products can be used to solve many problems in information biology. Since one gene product corresponds to several GO (Gene Ontology) terms, one way to calculate the gene product similarity is to use the similarity of their GO terms. This GO term similarity can be defined as the semantic similarity on the GO graph. There are many kinds of similarity definitions of two ...

متن کامل

Incorporating Semantic Similarity Measure in Genetic Algorithm : An Approach for Searching the Gene Ontology Terms

The most important property of the Gene Ontology is the terms. These control vocabularies are defined to provide consistent descriptions of gene products that are shareable and computationally accessible by humans, software agent, or other machine-readable meta-data. Each term is associated with information such as definition, synonyms, database references, amino acid sequences, and relationshi...

متن کامل

GOSemSim: an R package for measuring semantic similarity among GO terms and gene products

SUMMARY The semantic comparisons of Gene Ontology (GO) annotations provide quantitative ways to compute similarities between genes and gene groups, and have became important basis for many bioinformatics analysis approaches. GOSemSim is an R package for semantic similarity computation among GO terms, sets of GO terms, gene products and gene clusters. Four information content (IC)- and a graph-b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2014  شماره 

صفحات  -

تاریخ انتشار 2014