SSC: Statistical Subspace Clustering
نویسندگان
چکیده
Résumé. Cet article se place dans le cadre du subspace clustering, dont la problématique est double : identifier simultanément les clusters et le sousespace spécifique dans lequel chacun est défini, et caractériser chaque cluster par un nombre minimal de dimensions, permettant ainsi une présentation des résultats compréhensible par un expert du domaine d’application. Les méthodes proposées jusqu’à présent pour cette tâche ont le défaut de se restreindre à un cadre numérique. L’objectif de cet article est de proposer un algorithme de subspace clustering capable de traiter des données décrites à la fois par des attributs continus et des attributs catégoriels. Nous présentons une méthode basée sur l’algorithme classique EM mais opérant sur un modèle simplifié des données et suivi d’une technique originale de sélection d’attributs pour ne garder que les dimensions pertinentes de chaque cluster. Les expérimentations présentées ensuite, menées sur des bases de données aussi bien artificielles que réelles, montrent que notre algorithme présente des résultats robustes en termes de qualité de la classification et de compréhensibilité des clusters obtenus.
منابع مشابه
Subspace Clustering Reloaded: Sparse vs. Dense Representations
State-of-the-art methods for learning unions of subspaces from a collection of data leverage sparsity to form representations of each vector in the dataset with respect to the remaining vectors in the dataset. The resulting sparse representations can be used to form a subspace affinity matrix to cluster the data into their respective subspaces. While sparsity-driven methods for subspace cluster...
متن کاملA Survey on Soft Subspace Clustering
Subspace clustering (SC) is a promising technology involving clusters that are identified based on their association with subspaces in high-dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been studied extensively and are well accepted by the scientific community, SSC algorithms are relatively new. However...
متن کاملNoisy subspace clustering via matching pursuits
Sparsity-based subspace clustering algorithms have attracted significant attention thanks to their excellent performance in practical applications. A prominent example is the sparse subspace clustering (SSC) algorithm by Elhamifar and Vidal, which performs spectral clustering based on an adjacency matrix obtained by sparsely representing each data point in terms of all the other data points via...
متن کاملA Deterministic Analysis of Noisy Sparse Subspace Clustering for Dimensionality-reduced Data
Subspace clustering groups data into several lowrank subspaces. In this paper, we propose a theoretical framework to analyze a popular optimization-based algorithm, Sparse Subspace Clustering (SSC), when the data dimension is compressed via some random projection algorithms. We show SSC provably succeeds if the random projection is a subspace embedding, which includes random Gaussian projection...
متن کاملDimensionality-reduced subspace clustering
Subspace clustering refers to the problem of clustering unlabeled high-dimensional data points into a union of low-dimensional linear subspaces, whose number, orientations, and dimensions are all unknown. In practice one may have access to dimensionality-reduced observations of the data only, resulting, e.g., from undersampling due to complexity and speed constraints on the acquisition device o...
متن کاملNoisy Sparse Subspace Clustering
This paper considers the problem of subspace clustering under noise. Specifically, we study the behavior of Sparse Subspace Clustering (SSC) when either adversarial or random noise is added to the unlabelled input data points, which are assumed to lie in a union of low-dimensional subspaces. We show that a modified version of SSC is provably effective in correctly identifying the underlying sub...
متن کامل