High-Throughput Function Assignment for Novel Gene Products Using Annotation Clustering

نویسندگان

  • Alexander Renner
  • Hilmar Lapp
  • András Aszódi
چکیده

We have designed and implemented a software package for the automatic high-throughput function prediction for genes. This system attempts to assign a biological function to protein sequences by carrying out searches in sequence databanks and by locating functionally relevant motifs in the query sequences. The results produced by the various prediction methods consist of the annotations of matching sequences and/or motifs, which are free-format texts written by humans and therefore may describe the same concept with synonymous words. It was considered desirable to present the results in such a way that the annotations describing the same biological function are grouped together so that the user does not need to read through all of them. To this end we devised an algorithm that enables the hierarchical clustering of free-format documents based on the similarity of their contents. This poster presents an enhanced version of our previously published method [1].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مقایسه نتایج خوشه‌بندی سلسله مراتبی و غیرسلسله مراتبی پروتئین‌های مرتبط با سرطان‌های مری، معده و کلون براساس تشابهات تفسیر هستی‌شناسی ژنی

Background and Objective: Using proteomic methodologies and advent of high-throughput (HTP) investigation of proteins has created a need for new approaches in bioinformatics analysis of experimental results. Cluster analysis is a suitable statistical procedure that can be useful for analyzing these data sets.   Materials and Methods: In this research study, the identified proteins associated wi...

متن کامل

High-throughput functional annotation of novel gene products using document clustering.

Gene products differentially expressed in healthy vs. diseased tissues may be considered drug targets since the change in their expression level can be related to the cause and progression of the disease studied. A significant portion of the proteins produced by these genes will be unknown and consequently their function must be characterised. The experimental elucidation of biochemical functio...

متن کامل

Functional gene clustering via gene annotation sentences, MeSH and GO keywords from biomedical literature

Gene function annotation remains a key challenge in modern biology. This is especially true for high-throughput techniques such as gene expression experiments. Vital information about genes is available electronically from biomedical literature in the form of full texts and abstracts. In addition, various publicly available databases (such as GenBank, Gene Ontology and Entrez) provide access to...

متن کامل

Statistically rigorous automated protein annotation

MOTIVATION Assignment of putative protein functional annotation by comparative analysis using pre-defined experimental annotations is performed routinely by molecular biologists. The number and statistical significance of these assignments remains a challenge in this era of high-throughput proteomics. A combined statistical method that enables robust, automated protein annotation by reliably ex...

متن کامل

Graph-based sequence annotation using a data integration approach

The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000