Resistant multiple sparse canonical correlation.

نویسندگان

  • Jacob Coleman
  • Joseph Replogle
  • Gabriel Chandler
  • Johanna Hardin
چکیده

Canonical correlation analysis (CCA) is a multivariate technique that takes two datasets and forms the most highly correlated possible pairs of linear combinations between them. Each subsequent pair of linear combinations is orthogonal to the preceding pair, meaning that new information is gleaned from each pair. By looking at the magnitude of coefficient values, we can find out which variables can be grouped together, thus better understanding multiple interactions that are otherwise difficult to compute or grasp intuitively. CCA appears to have quite powerful applications to high-throughput data, as we can use it to discover, for example, relationships between gene expression and gene copy number variation. One of the biggest problems of CCA is that the number of variables (often upwards of 10,000) makes biological interpretation of linear combinations nearly impossible. To limit variable output, we have employed a method known as sparse canonical correlation analysis (SCCA), while adding estimation which is resistant to extreme observations or other types of deviant data. In this paper, we have demonstrated the success of resistant estimation in variable selection using SCCA. Additionally, we have used SCCA to find multiple canonical pairs for extended knowledge about the datasets at hand. Again, using resistant estimators provided more accurate estimates than standard estimators in the multiple canonical correlation setting. R code is available and documented at https://github.com/hardin47/rmscca.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Discrimination based Multiset Canonical Correlation Analysis for Multi-Feature Fusion and Recognition

Multiset canonical correlation analysis is a powerful technique for analyzing linear correlations among multiple representation data. However, it usually fails to discover the intrinsic sparse reconstructive relationship and discriminating structure of multiple data spaces in real-world applications. In this paper, by taking discriminative information of within-class and between-class sparse re...

متن کامل

Canonical sparse cross-view correlation analysis

Recently, multi-view feature extraction has attracted great interest and Canonical Correlation Analysis (CCA) is a powerful technique for finding the linear correlation between two view variable sets. However, CCA does not consider the structure and cross view information in feature extraction, which is very important for subsequence tasks. In this paper, a new approach called Canonical Sparse ...

متن کامل

Correlating Cellular Features with Gene Expression using CCA

To understand the biology of cancer, joint analysis of multiple data modalities, including imaging and genomics, is crucial. We propose the use of canonical correlation analysis (CCA) and a sparse variant as a preliminary discovery tool for identifying connections across modalities, specifically between gene expression and features describing cell and nucleus shape, texture, and stain intensity...

متن کامل

The RGCCA package for Regularized/Sparse Generalized Canonical Correlation Analysis

2 Multiblock data analysis with the RGCCA package 1 2.1 Regularized Generalized Canonical Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Variable selection in RGCCA: SGCCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Higher stage block components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.4 Implementatio...

متن کامل

Collaborative regression.

We consider the scenario where one observes an outcome variable and sets of features from multiple assays, all measured on the same set of samples. One approach that has been proposed for dealing with these type of data is "sparse multiple canonical correlation analysis" (sparse mCCA). All of the current sparse mCCA techniques are biconvex and thus have no guarantees about reaching a global opt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Statistical applications in genetics and molecular biology

دوره 15 2  شماره 

صفحات  -

تاریخ انتشار 2016