Numerical Data Co-clustering via Sum-Squared Residue Minimization and User-defined Constraint Satisfaction
نویسندگان
چکیده
Co-clustering aims at computing a bi-partition that is a collection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support interpretations. We consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e., both objects and attributes can be involved), but also for interval constraints that enforce properties of coclusters when considering ordered domains. We propose an iterative coclustering algorithm which exploits user-de ned constraints while minimizing the sum-squared residues, i.e., an objective function introduced for gene expression data clustering by Cho et al. (2004).
منابع مشابه
Repeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملMinimum Sum-Squared Residue Co-Clustering of Gene Expression Data
Microarray experiments have been extensively used for simultaneously measuring DNA expression levels of thousands of genes in genome research. A key step in the analysis of gene expression data is the clustering of genes into groups that show similar expression values over a range of conditions. Since only a small subset of the genes participate in any cellular process of interest, by focusing ...
متن کاملEffect of Data Transformation on Residue
Recently, Aguilar-Ruiz [2005] considers a data matrix containing both scaling and shifting factors and shows that the mean squared residue [Cheng and Church, 2000], called RESIDUE(II) in this paper, is useful to discover shifting patterns, but not appropriate to find scaling patterns. This finding draws our attention on the weakness of RESIDUE(II) measure and the need of new approaches to disco...
متن کاملMicroarray Time-Series Data Clustering via Multiple Alignment of Gene Expression Profiles
Genes with similar expression profiles are expected to be functionally related or co-regulated. In this direction, clustering microarray time-series data via pairwise alignment of piece-wise linear profiles has been recently introduced. We propose a k-means clustering approach based on a multiple alignment of natural cubic spline representations of gene expression profiles. The multiple alignme...
متن کاملConstrained Co-clustering of Gene Expression Data
In many applications, the expert interpretation of coclustering is easier than for mono-dimensional clustering. Co-clustering aims at computing a bi-partition that is a collection of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support interpretations. Many constrained clustering algorithms have been proposed to exploit the do...
متن کامل