Optimal Estimation and Completion of Matrices with Biclustering Structures

نویسندگان

  • Chao Gao
  • Yu Lu
  • Zongming Ma
  • Harrison H. Zhou
چکیده

Biclustering structures in data matrices were first formalized in a seminal paper by John Hartigan [15] where one seeks to cluster cases and variables simultaneously. Such structures are also prevalent in block modeling of networks. In this paper, we develop a theory for the estimation and completion of matrices with biclustering structures, where the data is a partially observed and noise contaminated matrix with a certain underlying biclustering structure. In particular, we show that a constrained least squares estimator achieves minimax rate-optimal performance in several of the most important scenarios. To this end, we derive unified high probability upper bounds for all sub-Gaussian data and also provide matching minimax lower bounds in both Gaussian and binary cases. Due to the close connection of graphon to stochastic block models, an immediate consequence of our general results is a minimax rate-optimal estimator for sparse graphons.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Approximation Ratio for Biclustering

The problem of biclustering consists of the simultaneous clustering of rows and columns of a matrix such that each of the submatrices induced by a pair of row and column clusters is as uniform as possible. In this paper we approximate the optimal biclustering by applying one-way clustering algorithms independently on the rows and on the columns of the input matrix. We show that such a solution ...

متن کامل

Structure of Wavelet Covariance Matrices and Bayesian Wavelet Estimation of Autoregressive Moving Average Model with Long Memory Parameter’s

In the process of exploring and recognizing of statistical communities, the analysis of data obtained from these communities is considered essential. One of appropriate methods for data analysis is the structural study of the function fitting by these data. Wavelet transformation is one of the most powerful tool in analysis of these functions and structure of wavelet coefficients are very impor...

متن کامل

Spectral biclustering of microarray cancer data: co-clustering genes and conditions

Global analyses of RNA expression levels are useful for classifying genes and overall phenotypes. Often these classification problems are linked, and one wants to simultaneously find "marker genes" that are differentially expressed in particular “conditions”. We have developed a method that simultaneously clusters genes and conditions, finding distinctive "checkerboard" patterns in matrices of ...

متن کامل

A structured view on pattern mining-based biclustering

Mining matrices to find relevant biclusters, subsets of rows exhibiting a coherent pattern over a subset of columns, is a critical task for a wide-set of biomedical and social applications. Since biclustering is a challenging combinatorial optimization task, existing approaches place restrictions on the allowed structure, coherence and quality of biclusters. Biclustering approaches relying on p...

متن کامل

Mining Maximally Banded Matrices in Binary Data

Binary data occurs often in several real world applications ranging from social networks to bioinformatics. Extracting patterns from such data has been a focus of fundamental data mining tasks including association rule analysis, sequence mining and bi-clustering. Recently, the utility of banded structures in binary matrices has been pointed out with applications in paleontology, bioinformatics...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2016