Transformation and model choice for RNA-seq co-expression analysis.

نویسندگان

  • Andrea Rau
  • Cathy Maugis-Rabusseau
چکیده

Although a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA sequencing (RNA-seq) data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq data sets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose a Bioconductor package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regulatory effects of cis- and trans-LncRNAs on differential expression of genes following infection with viral hemorrhagic septicemia virus in rainbow trout (Oncorhynchus mykiss)

In this study the cis and trans regulatory effect of long non-coding genes (lncRNA) on the expression of genes in fish infected by Viral hemorrhagic septicemia virus (VHS) was investigated using RNA-seq technology. At the end of experimental period (the thirty fifth day), total RNA was extracted from spleen tissue (group treated with virus) and physiological serum (control group) was used to pr...

متن کامل

I-13: Transcriptome Dynamics of Human and Mouse Preimplantation Embryos Revealed by Single Cell RNA-Sequencing

Background: Mammalian preimplantation development is a complex process involving dramatic changes in the transcriptional architecture. However, it is still unclear about the crucial transcriptional network and key hub genes that regulate the proceeding of preimplantation embryos. Materials and Methods: Through single-cell RNAsequencing (RNA-seq) of both human and mouse preimplantation embryos, ...

متن کامل

Identification of soybean circular RNAs in response to low nitrogen and phosphorus stress

Soybean, one of the most important sources of edible oil and protein in the world, is exposed to various environmental biotic and abiotic stresses. These stresses can negatively impact the quality and quantity of soybean production. This study aimed to identify genes that express circular RNAs in response to low phosphorus and nitrogen stresses in soybean roots. Soybean seeds were grown under d...

متن کامل

Impact of Gene Annotation on RNA-seq Data Analysis

RNA-seq has become increasingly popular in transcriptome profiling. One of the major challenges in RNA-seq data analysis is the accurate mapping of junction reads to their genomic origins. To detect splicing sites in short reads, many RNA-seq aligners use reference transcriptome to inform placement of junction reads. However, no systematic evaluation has been performed to assess or quantify the...

متن کامل

Gene expression Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models

Motivation: In recent years, gene expression studies have increasingly made use of high-throughput sequencing technology. In turn, research concerning the appropriate statistical methods for the analysis of digital gene expression (DGE) has flourished, primarily in the context of normalization and differential analysis. Results: In this work, we focus on the question of clustering DGE profiles ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Briefings in bioinformatics

دوره   شماره 

صفحات  -

تاریخ انتشار 2017