Improved Cluster Partition in Principal Component Analysis Guided Clustering

نویسندگان

S. M. Shaharudin

N. Ahmad

F. Yusof

چکیده

Principal component analysis (PCA) guided clustering approach is widely used in high dimensional data to improve the efficiency of Kmeans cluster solutions. Typically, Pearson correlation is used in PCA to provide an eigenanalysis to obtain the associated components that account for most of the variations in the data. However, PCA based Pearson correlation can be sensitive on non-Gaussian distributed data, which involve skewed observations such as outlying values. Thus, applying PCA based Pearson correlation on such data could affect cluster partitions and generate extremely imbalanced clusters in a high dimensional space. In this study, Tukey's biweight correlation based on Mestimate approach in PCA is used as an alternative to Pearson correlation. This approach is more resistant to outlying values as it examines each observation and down weight those that lie far from the center of the data. In particular two major features are highlighted: (1) fewer components are retained and imbalanced clusters at the recommended cumulative percentage of variation threshold is avoided; (2) the cluster quality with respect to external, internal and relative criteria as shown in Rand, Silhouette and Davies-Bouldin indices, outperform that of the clusters from PCA based Pearson correlation. General Terms Data Structures and Algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Principal component methods - hierarchical clustering - partitional clustering : why would we need to choose for visualizing data ?

This paper combines three exploratory data analysis methods, principal component methods, hierarchical clustering and partitioning, to enrich the description of the data. Principal component methods are used as preprocessing step for the clustering in order to denoise the data, transform categorical data in continuous ones or balanced groups of variables. The principal component representation ...

متن کامل

clustering - partitional clustering : why would we need to choose for visualizing data ?

متن کامل

Cluster Ensembles for High Dimensional Clustering: An Empirical Study

This paper studies cluster ensembles for high dimensional data clustering. We examine three different approaches to constructing cluster ensembles. To address high dimensionality, we focus on ensemble construction methods that build on two popular dimension reduction techniques, random projection and principal component analysis (PCA). We present evidence showing that ensembles generated by ran...

متن کامل

Using Clustering and Factor Analysis in Cross Section Analysis Based on Economic-Environment Factors

Homogeneity of groups in studies those use cross section and multi-level data is important. Most studies in economics especially panel data analysis need some kinds of homogeneity to ensure validity of results. This paper represents the methods known as clustering and homogenization of groups in cross section studies based on enviro-economics components. For this, a sample of 92 countries which...

متن کامل

Validity-guided (re)clustering with applications to image segmentation

When clustering algorithms are applied to image seg-mentation, the goal is to solve a classification problem. However, these algorithms do not directly optimize classification quality. As a result, they are susceptible to two problems: P1) the criterion they optimize may not be a good estimator of " true " classification quality, and P2) they often admit many (suboptha€) solutions. This paper i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Improved Cluster Partition in Principal Component Analysis Guided Clustering

نویسندگان

چکیده

منابع مشابه

Principal component methods - hierarchical clustering - partitional clustering : why would we need to choose for visualizing data ?

clustering - partitional clustering : why would we need to choose for visualizing data ?

Cluster Ensembles for High Dimensional Clustering: An Empirical Study

Using Clustering and Factor Analysis in Cross Section Analysis Based on Economic-Environment Factors

Validity-guided (re)clustering with applications to image segmentation

عنوان ژورنال:

اشتراک گذاری