Comparison of dimension reduction-based logistic regression models for case-control genome-wide association study: principal components analysis vs. partial least squares
نویسندگان
چکیده
With recent advances in biotechnology, genome-wide association study (GWAS) has been widely used to identify genetic variants that underlie human complex diseases and traits. In case-control GWAS, typical statistical strategy is traditional logistical regression (LR) based on single-locus analysis. However, such a single-locus analysis leads to the well-known multiplicity problem, with a risk of inflating type I error and reducing power. Dimension reduction-based techniques, such as principal component-based logistic regression (PC-LR), partial least squares-based logistic regression (PLS-LR), have recently gained much attention in the analysis of high dimensional genomic data. However, the performance of these methods is still not clear, especially in GWAS. We conducted simulations and real data application to compare the type I error and power of PC-LR, PLS-LR and LR applicable to GWAS within a defined single nucleotide polymorphism (SNP) set region. We found that PC-LR and PLS can reasonably control type I error under null hypothesis. On contrast, LR, which is corrected by Bonferroni method, was more conserved in all simulation settings. In particular, we found that PC-LR and PLS-LR had comparable power and they both outperformed LR, especially when the causal SNP was in high linkage disequilibrium with genotyped ones and with a small effective size in simulation. Based on SNP set analysis, we applied all three methods to analyze non-small cell lung cancer GWAS data.
منابع مشابه
بهکارگیری متغیرهای پنهان در مدل رگرسیون لجستیک برای حذف اثر همخطی چندگانه در تحلیل برخی عوامل مرتبط با سرطان پستان
Background and Objectives: Logistic regression is one of the most widely used generalized linear models for analysis of the relationships between one or more explanatory variables and a categorical response. Strong correlations among explanatory variables (multicollinearity) reduce the efficiency of model to a considerable degree. In this study we used latent variables to reduce the effects of ...
متن کاملBoiling Points Predictions Study via Dimension Reduction Methods: SIR, PCR and PLSR
Variable selection is an important tool in QSAR. In this article, we employ three known techniques: sliced inverse regression (SIR), principal components regression (PCR) and partial least squares regression (PLSR) for models to predict the boiling points of 530 saturated hydrocarbons. With 122 topological indices as input variables our results show that these three methods have good performanc...
متن کاملSTA 4107/5107 Statistical Learning: Principle Components and Partial Least Squares Regression
Principal components analysis is traditionally presented as an interpretive multivariate technique, where the loadings are chosen to maximally explain the variance in the variable. However, we will consider it here mainly as a statistical learning tool, by using the derived components in a least squares regression to predict unobserved response variables using the principal components. Principa...
متن کاملOn partial least squares dimension reduction for microarray-based classification: a simulation study
In microarray tumor tissue classi'cation studies, the expressions of thousands of genes (variables) are simultaneously measured across a few tissue samples. Standard statistical methodologies in classi'cation do not work well when the dimension, p, is greater than the sample size, N . One approach to classi'cation problems, when p N , is to 'rst apply a dimension reduction method and then perfo...
متن کاملFunctional Data Analysis of Spectroscopic Data with Application to Classification of Colon Polyps
In this study, two functional logistic regression models with functional principal component basis (FPCA) and functional partial least squares basis (FPLS) have been developed to distinguish precancerous adenomatous polyps from hyperplastic polyps for the purpose of classification and interpretation. The classification performances of the two functional models have been compared with two widely...
متن کامل