Principal Component Analysis, A Powerful Unsupervised Learning Technique

نویسنده

  • George C. J. Fernandez
چکیده

Data mining is a collection of analytical techniques to uncover new trends and patterns in massive databases. These data mining techniques stress visualization to thoroughly study the structure of data and to check the validity of the statistical model fit which leads to proactive decision making. Principal component analysis (PCA) is one of the unsupervised data mining tools used to reduce dimensionality in multivariate data. In PCA, dimensionality of multivariate data is reduced by transforming the correlated variables into linearly transformed uncorrelated variables. PCA summarizes the variation in a correlated multi-attribute to a set of uncorrelated components, each of which is a particular linear combination of the original variables. The extracted uncorrelated components are called principal components (PC) and are estimated from the eigenvectors of the covariance or correlation matrix of the original variables. Therefore, the objective of PCA is to achieve parsimony and reduce dimensionality by extracting the smallest number components that account for most of the variation in the original multivariate data and to summarize the data with little loss of information. A user-friendly SAS application utilizing the latest capabilities of SAS macro to perform PCA is presented here. Previously published diabetes screening data containing multi-attributes is used to reduce multi attributes into few dimensions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

3D Object Pose Inference via Kernel Principal Component Analysis with Image Euclidian Distance (IMED)

Kernel Principal Component Analysis (KPCA) is a powerful non-linear unsupervised learning technique for high dimensional pattern analysis. KPCA on images, however, usually considers each image pixel as an independent dimension and does not take into account the spatial relationship of nearby pixels. In this paper, we show how the Image Euclidian Distance (IMED), which takes into account local p...

متن کامل

On{line Learning of Prototypes and Principal Components

We review our recent investigation of on{line unsupervised learning from high{dimensional structured data. First, on{line competitive learning is studied as a method for the identiication of prototype vectors from overlapping clusters of examples. Speciically, we analyse the dynamics of the well{known winner{takes{all or K{means algorithm. As a second standard learning technique, the applicatio...

متن کامل

Finding Structure in Signals, Images, and Data

The talk will be a tutorial survey, concentrating on the main principles and categories of unsupervised neural learning in the problem of data mining for signals, images, and data. In neural computation, there are two classical categories for unsupervised learning methods and models: first, extensions of Principal Component Analysis and Factor Analysis, and second, learning vector coding or clu...

متن کامل

The Analysis of Multispectral Image Data with Self−organizing Feature Maps

The analysis of multispectral sceneries is still a challenging task although many different algorithms for a mathematical analysis exist. Most classification algorithms work in a supervised mode only, i.e. they need to know the possible land usage classes prior to the calculation. In this step many simplifications and assumptions have to be done which directly influence the result. KOHONEN ́s se...

متن کامل

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003