Variable selection for generalized canonical correlation analysis.

نویسندگان

  • Arthur Tenenhaus
  • Cathy Philippe
  • Vincent Guillemot
  • Kim-Anh Le Cao
  • Jacques Grill
  • Vincent Frouin
چکیده

Regularized generalized canonical correlation analysis (RGCCA) is a generalization of regularized canonical correlation analysis to 3 or more sets of variables. RGCCA is a component-based approach which aims to study the relationships between several sets of variables. The quality and interpretability of the RGCCA components are likely to be affected by the usefulness and relevance of the variables in each block. Therefore, it is an important issue to identify within each block which subsets of significant variables are active in the relationships between blocks. In this paper, RGCCA is extended to address the issue of variable selection. Specifically, sparse generalized canonical correlation analysis (SGCCA) is proposed to combine RGCCA with an [Formula: see text]-penalty in a unified framework. Within this framework, blocks are not necessarily fully connected, which makes SGCCA a flexible method for analyzing a wide variety of practical problems. Finally, the versatility and usefulness of SGCCA are illustrated on a simulated dataset and on a 3-block dataset which combine gene expression, comparative genomic hybridization, and a qualitative phenotype measured on a set of 53 children with glioma. SGCCA is available on CRAN as part of the RGCCA package.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The RGCCA package for Regularized/Sparse Generalized Canonical Correlation Analysis

2 Multiblock data analysis with the RGCCA package 1 2.1 Regularized Generalized Canonical Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Variable selection in RGCCA: SGCCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Higher stage block components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.4 Implementatio...

متن کامل

Applying Variable Deletion Strategies in Bankruptcy Studies to Capture Common Information and Increase Their Reality

In financial distress studies selection of variable is commonly basedon the success of variables in variable sets employed in earlierbankruptcy studies, suggestions in the literature or an accompanyingdata reduction in a large set of variables. If seemingly different variablesets exhibit a strong relationship then heterogeneous variable setscapture common information. Canonical correlation anal...

متن کامل

The subselect R package

The subselect package addresses the issue of variable selection in different statistical contexts, among which exploratory data analyses; univariate or multivariate linear models; generalized linear models; principal components analysis; linear discriminant analysis, canonical correlation analysis. Selecting variable subsets requires the definition of a numerical criterion which measures the qu...

متن کامل

Generalization of Canonical Correlation Analysis from Multivariate to Functional Cases and its related problems

In multivariate cases, the aim of canonical correlation analysis (CCA) for two sets of variables x and y is to obtain linear combinations of them so that they have the largest possible correlation. However, when x and y are continouse functions of another variable (generally time) in nature, these two functions belong to function spaces which are of infinite dimension, and CCA for them should b...

متن کامل

Canonical Correlation Analysis for Determination of Relationship between Morphological and Physiological Pollinated Characteristics in Five Varieties of Phalaenopsis

Phalaenopsis is an important genus of orchids that is grown for economical production of cut flower and potted plants. The objective of this study is the evaluation of correlation between morphological and physiological traits of self and cross-pollination of 5 varieties of Phalaenopsis orchid. Some morphological traits were measured: Capsule length (CL), capsule volume (CV), weight of seeds in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Biostatistics

دوره 15 3  شماره 

صفحات  -

تاریخ انتشار 2014