Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data

نویسندگان

  • Ridvan Eksi
  • Hongdong Li
  • Rajasree Menon
  • Yuchen Wen
  • Gilbert S. Omenn
  • Matthias Kretzler
  • Yuanfang Guan
چکیده

Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires 'ground-truth' functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the 'responsible' isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the 'responsible' isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene-centered function prediction to isoform-level predictions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A mixed model approach for joint genetic analysis of alternatively spliced transcript isoforms using RNA-Seq data

RNA-Seq technology allows for studying the transcriptional state of the cell at an unprecedented level of detail. Beyond quantification of whole-gene expression, it is now possible to disentangle the abundance of individual alternatively spliced transcript isoforms of a gene. A central question is to understand the regulatory processes that lead to differences in relative abundance variation du...

متن کامل

SparseIso: a novel Bayesian approach to identify alternatively spliced isoforms from RNA-seq data

Motivation Recent advances in high-throughput RNA sequencing (RNA-seq) technologies have made it possible to reconstruct the full transcriptome of various types of cells. It is important to accurately assemble transcripts or identify isoforms for an improved understanding of molecular mechanisms in biological systems. Results We have developed a novel Bayesian method, SparseIso, to reliably i...

متن کامل

Modeling the functional relationship network at the isoform level through heterogeneous data integration

CC-BY-NC-ND 4.0 International license not peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was. Abstract Functional relationship networks, which reveal the collaborative roles between genes, have significantly accelerated our understanding of gene functions and phenotypic relevance. However, establishing such networks for alternativ...

متن کامل

Modeling the functional relationship network at the splice isoform level through heterogeneous data integration

CC-BY-NC-ND 4.0 International license not peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was. Abstract Functional relationship networks, which reveal the collaborative roles between genes, have significantly accelerated our understanding of gene functions and phenotypic relevance. However, establishing such networks for alternativ...

متن کامل

Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq.

RNA-seq is currently the technology of choice for global measurement of transcript abundances in cells. Despite its successes, isoform-level quantification remains difficult because short RNA-seq reads are often compatible with multiple alternatively spliced isoforms. Existing methods rely heavily on uniquely mapping reads, which are not available for numerous isoforms that lack regions of uniq...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2013