Visual Analysis of Multi-Dimensional Categorical Data Sets

نویسندگان

  • Bertjan Broeksema
  • Alexandru Telea
  • Thomas Baudel
چکیده

We present a set of interactive techniques for the visual analysis of multi-dimensional categorical data. Our approach is based on multiple correspondence analysis (MCA), which allows one to analyse relationships, patterns, trends and outliers among dependent categorical variables. We use MCA as a dimensionality reduction technique to project both observations and their attributes in the same 2D space. We use a treeview to show attributes and their domains, a histogram of their representativity in the data set and as a compact overview of attribute-related facts. A second view shows both attributes and observations. We use a Voronoi diagram whose cells can be interactively merged to discover salient attributes, cluster values and bin categories. Bar chart legends help assigning meaning to the 2D view axes and 2D point clusters. We illustrate our techniques with real-world application data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Multi-dimensional Markov Chain Models

Markov chain models are commonly used to model categorical data sequences. In this paper, we propose a multi-dimensional Markov chain model for modeling high dimensional categorical data sequences. In particular, the models are practical when there are limited data available. We then test the model with some practical sales demand data. Numerical results indicate the proposed model when compare...

متن کامل

GenoSets: Visual Analytic Methods for Comparative Genomics

Many important questions in biology are, fundamentally, comparative, and this extends to our analysis of a growing number of sequenced genomes. Existing genomic analysis tools are often organized around literal views of genomes as linear strings. Even when information is highly condensed, these views grow cumbersome as larger numbers of genomes are added. Data aggregation and summarization meth...

متن کامل

A novel attribute weighting algorithm for clustering high-dimensional categorical data

Due to data sparseness and attribute redundancy in high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. To effectively address this issue, this paper presents a new optimization algorithm for clustering high-dimensional categorical data, which is an extension of the k-modes clustering algorithm. In the proposed algorithm, a novel weighting techniq...

متن کامل

Method of particles in visual clustering of multi-dimensional and large data sets

A method dedicated for visual clustering of N -dimensional data sets is presented. It is based on the classical feature extraction technique – the Sammon’s mapping. This technique empowered by a particle approach used in the Sammon’s criterion minimization makes the method more reliable, general and efficient. To show its reliability, the results of tests are presented, which were made to exemp...

متن کامل

Visualizing Relationships among Categorical Variables

Centuries of chart-making have produced some outstanding charts tailored specifically to the data being visualized. They have also produced a myriad of less-than-outstanding charts in the same vein. I instead present a set of techniques that may be applied to arbitrary datasets with specific properties. In particular, I describe two techniques – Nested Category Maps and Correlation Maps – for v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Comput. Graph. Forum

دوره 32  شماره 

صفحات  -

تاریخ انتشار 2013