Canonical Correlation Analysis for Gene-Based Pleiotropy Discovery
نویسندگان
چکیده
Genome-wide association studies have identified a wealth of genetic variants involved in complex traits and multifactorial diseases. There is now considerable interest in testing variants for association with multiple phenotypes (pleiotropy) and for testing multiple variants for association with a single phenotype (gene-based association tests). Such approaches can increase statistical power by combining evidence for association over multiple phenotypes or genetic variants respectively. Canonical Correlation Analysis (CCA) measures the correlation between two sets of multidimensional variables, and thus offers the potential to combine these two approaches. To apply CCA, we must restrict the number of attributes relative to the number of samples. Hence we consider modules of genetic variation that can comprise a gene, a pathway or another biologically relevant grouping, and/or a set of phenotypes. In order to do this, we use an attribute selection strategy based on a binary genetic algorithm. Applied to a UK-based prospective cohort study of 4286 women (the British Women's Heart and Health Study), we find improved statistical power in the detection of previously reported genetic associations, and identify a number of novel pleiotropic associations between genetic variants and phenotypes. New discoveries include gene-based association of NSF with triglyceride levels and several genes (ACSM3, ERI2, IL18RAP, IL23RAP and NRG1) with left ventricular hypertrophy phenotypes. In multiple-phenotype analyses we find association of NRG1 with left ventricular hypertrophy phenotypes, fibrinogen and urea and pleiotropic relationships of F7 and F10 with Factor VII, Factor IX and cholesterol levels.
منابع مشابه
P-215: Discovery of A Novel APA Variant of A Human Potential Gene Based on Expressed Sequenced Tags Analysis
Background: Expressed sequence tags (ESTs) are sequences of cDNA fragments prepared from different tissue sources. There are over one million of these sequences in the publicly available database, and these sequences are believed to represent more than half of all human genes. The ESTs belong to different cDNA libraries, was prepared from one particular cell type, organ, or tumor. Therefore, th...
متن کاملCanonical Correlation Analysis for Determination of Relationship between Morphological and Physiological Pollinated Characteristics in Five Varieties of Phalaenopsis
Phalaenopsis is an important genus of orchids that is grown for economical production of cut flower and potted plants. The objective of this study is the evaluation of correlation between morphological and physiological traits of self and cross-pollination of 5 varieties of Phalaenopsis orchid. Some morphological traits were measured: Capsule length (CL), capsule volume (CV), weight of seeds in...
متن کاملCanonical Analysis of the Relationship between Components of Professional Ethics and Dimensions of Social Responsibility
Background: Today, professional ethics and social responsibility play an important role in organizations. This study aimed canonical analysis of the relationship between components of professional ethics and social responsibility dimensions among the first high school teachers in the Naghadeh province. Method: This study, in terms of purpose is application, and in terms of data collec...
متن کاملA quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data
Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and geneti...
متن کاملCorrelating Cellular Features with Gene Expression using CCA
To understand the biology of cancer, joint analysis of multiple data modalities, including imaging and genomics, is crucial. We propose the use of canonical correlation analysis (CCA) and a sparse variant as a preliminary discovery tool for identifying connections across modalities, specifically between gene expression and features describing cell and nucleus shape, texture, and stain intensity...
متن کامل