greedy clustering method

Beam Search Induction and Similarity Constraints for Predictive Clustering Trees

2006

Dragi Kocev Jan Struyf Saso Dzeroski

Much research on inductive databases (IDBs) focuses on local models, such as item sets and association rules. In this work, we investigate how IDBs can support global models, such as decision trees. Our focus is on predictive clustering trees (PCTs). PCTs generalize decision trees and can be used for prediction and clustering, two of the most common data mining tasks. Regular PCT induction buil...

متن کامل

Sparse and Greedy: Sparsifying Submodular Facility Location Problems

2015

Erik M. Lindgren Shanshan Wu Alexandros G. Dimakis

Submodular facility location functions are widely used for summarizing large datasets and have found applications ranging from sensor placement, image retrieval, and clustering. A significant problem is that evaluating such functions typically requires the calculation of pairwise benefits for all items, which is computationally unmanageable for large problems. In this paper we propose a sparsif...

متن کامل

Application of Cardinality based GRASP to the Biclustering of Gene Expression Data

2010

Shyama Das Sumam Mary Idicula Yizong Cheng George M. Church Anupam Chakraborty Hitashyam Maka Gary A Kochenberger Smitha Dharan

Biclustering algorithms perform simultaneous row and column clustering of a given data matrix. In gene expression dataset a bicluster is a subset of genes that exhibit similar expression patterns through a subset of conditions. Biclustering is a useful data mining technique for identifying local patterns from gene expression data. In this paper biclusters are identified in two steps. In the fir...

متن کامل

Similarity Constraints in Beam-search Building of Predictive Clustering Trees

2006

Dragi Kocev Jan Struyf

We investigate how inductive databases (IDBs) can support global models, such as decision trees. We focus on predictive clustering trees (PCTs). PCTs generalize decision trees and can be used for prediction and clustering, two of the most common data mining tasks. Regular PCT induction builds PCTs top-down, using a greedy algorithm, similar to that of C4.5. We propose a new induction algorithm ...

متن کامل

Massive fungal biodiversity data re-annotation with multi-level clustering

2014

Duong Vu Szániszló Szöke Christian Wiwie Jan Baumbach Gianluigi Cardinali Richard Röttger Vincent Robert

With the availability of newer and cheaper sequencing methods, genomic data are being generated at an increasingly fast pace. In spite of the high degree of complexity of currently available search routines, the massive number of sequences available virtually prohibits quick and correct identification of large groups of sequences sharing common traits. Hence, there is a need for clustering tool...

متن کامل

Exact and rapid linear clustering of networks with dynamic programming

Journal: :Proceedings of The Royal Society A: Mathematical, Physical and Engineering Sciences 2023

We study the problem of clustering networks whose nodes have imputed or physical positions in a single dimension, such as prestige hierarchies similarity dimension hyperbolic embeddings. Existing algorithms, critical gap method and other greedy strategies, only offer approximate solutions. Here, we introduce dynamic programming approach that returns provably optimal solutions polynomial time --...

متن کامل

Data Reduction Techniques for Instance-Based Learning from Human/Computer Interface Data

2000

Terran Lane Carla E. Brodley

We describe the task of user-oriented anomaly detection for computer security. In this domain the goal is to develop a model of a computer user’s normal behavioral patterns and to detect anomalous conditions as deviations from expected behaviors. We present an instance-based learning (IBL) system for profiling users and examine some domain constraints with respect to our approach. In particular...

متن کامل

Tight Continuous Relaxation of the Balanced k-Cut Problem

2014

Syama Sundar Rangapuram Pramod Kaushik Mudrakarta Matthias Hein

Spectral Clustering as a relaxation of the normalized/ratio cut has become one of the standard graph-based clustering methods. Existing methods for the computation of multiple clusters, corresponding to a balanced k-cut of the graph, are either based on greedy techniques or heuristics which have weak connection to the original motivation of minimizing the normalized cut. In this paper we propos...

متن کامل

Inference of a Phylogenetic Tree: Hierarchical Clustering versus Genetic Algorithm

2012

Glenn Blanchette Richard A. O'Keefe Lubica Benusková

This paper compares the implementations and performance of two computational methods, hierarchical clustering and a genetic algorithm, for inference of phylogenetic trees in the context of the artificial organism Caminalcules. Although these techniques have a superficial similarity, in that they both use agglomeration as their construction method, their origin and approaches are antithetical. F...

متن کامل

GANC: Greedy agglomerative normalized cut for graph clustering

Journal: :Pattern Recognition 2012

Seyed Salim Tabatabaei Mark Coates Michael G. Rabbat

This paper describes a graph clustering algorithm that aims to minimize the normalized cut criterion and has a model order selection procedure. The performance of the proposed algorithm is comparable to spectral approaches in terms of minimizing normalized cut. However unlike spectral approaches, the proposed algorithm scales to graphs with millions of nodes and edges. The algorithm consists of...

متن کامل