Evaluating the statistical significance of biclusters

نویسندگان

  • Jason D. Lee
  • Yuekai Sun
  • Jonathan E. Taylor
چکیده

Biclustering (also known as submatrix localization) is a problem of high practical relevance in exploratory analysis of high-dimensional data. We develop a framework for performing statistical inference on biclusters found by score-based algorithms. Since the bicluster was selected in a data dependent manner by a biclustering or localization algorithm, this is a form of selective inference. Our framework gives exact (non-asymptotic) confidence intervals and p-values for the significance of the selected biclusters.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis and visualization of gene expression data using biclustering: A comparative study

In the last few years the gene expression microarray technology has become a central tool in the field of functional genomics in which the expression levels of thousands of genes in a biological sample are determined in a single experiment. Several clustering and biclustering methods have been introduced to analyze the gene expression data by identifying the similar patterns and grouping genes ...

متن کامل

SiBIC: A Web Server for Generating Gene Set Networks Based on Biclusters Obtained by Maximal Frequent Itemset Mining

Detecting biclusters from expression data is useful, since biclusters are coexpressed genes under only part of all given experimental conditions. We present a software called SiBIC, which from a given expression dataset, first exhaustively enumerates biclusters, which are then merged into rather independent biclusters, which finally are used to generate gene set networks, in which a gene set as...

متن کامل

Comparison of Biological Significance of Biclusters of SIMBIC and SIMBIC+ Biclustering Models

Query driven Biclustering Model refers to the problem of extracting biclusters based on a query gene or query condition. The extracted biclusters consist of a set of genes and a subset of conditions that are similar to the query gene or query condition and it includes the query input also. Two approaches applied for biclustering problems are topdown and bottom-up, based on how they tackle the p...

متن کامل

Discovering statistically significant biclusters in gene expression data

In gene expression data, a bicluster is a subset of the genes exhibiting consistent patterns over a subset of the conditions. We propose a new method to detect significant biclusters in large expression datasets. Our approach is graph theoretic coupled with statistical modelling of the data. Under plausible assumptions, our algorithm is polynomial and is guaranteed to find the most significant ...

متن کامل

Constrained Subspace Clustering for Time Series Gene Expression Data

For time series gene expression data, it is an important problem to find subgroups of genes with similar expression pattern in a consecutive time window. In this paper, we extend a fuzzy c-means clustering algorithm to construct two models to detect biclusters respectively, i.e., constant value biclusters and similarity-based biclusters whose gene expression profiles are similar within consecut...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015