Firth logistic regression for rare variant association tests

نویسنده

  • Xuefeng Wang
چکیده

In association tests of sites with low minor allele frequency or count, it is known that single-variant tests are impractical to use because the results from which will be either underpowered or unreliable. Joint analyses by pooling or “collapsing” multiple variants based on annotated gene group information are thus more preferred in rare variant association tests. However, the issue remains in a genomewide association scan because there is always a portion of regions containing less number of variant sites. Moreover, most current exome or genome sequencing association studies are still limited to small sample sizes. Standard testing methods that rely on the asymptotic theories will also not preserve the type I error rate. These factors together will distort the final genome-wide quantile–quantile plot of the testing p-values. A penalized likelihood based method called Firth logistic regression method may provide a simple yet effective solution. It is easier to implement and less computational intensive than alternative approaches such as permutation or bootstrapping, and worthy of more attention in association studies of sequencing data. The basic idea of the firth logistic regression is to introduce a more effective score function by adding an term that counteracts the first-order term from the asymptotic expansion of the bias of the maximum likelihood estimation— and the term will goes to zero as the sample size increases (Firth, 1993; Heinze and Schemper, 2002). For generalized linear models with canonical links such as in logistic regression, Firth’s approach is equivalent to penalizing the likelihood by the Jeffreys invariant prior. The attraction of this method is that it provides bias-reduction for small sample size as well as yields finite and consistent estimates even in case of separation. In a binary response model, separation issue occurs when one variant is associated with only one type of outcome, e.g., when all individuals who carry a particular variant (although rare) are diagnosed with the disease. The phenomenon is more commonly seen in rare variants studies, especially when a recessive model is assumed. These variants are undoubtedly important but will not be detected by standard statistical packages as they often report large p-values (and exceptionally larger standard errors)—sometimes even without a warning message. Although approaches like Fisher’s exact test and exact logistic regression can be used to handle the separation problem, their use become problematic when there are continuous covariates need to be considered. The implementation of firth logistic regression is fairly easy as it is now available in many standard packages (such as R package “logistf”). In a recent work, Ma et al. (2013) performed simulations to compare different methods for the rare variant association test over varied designs and gave promising results. They showed that the firth-regression-based joint analysis of the individual-level data controls type I error well for both balanced and unbalanced studies, and which is more powerful than score test based metaanalysis. However, methods and software are yet to be developed to handle analyses with family or related samples. Two options are available to handle familial correlations. One is to incorporate Firth correction into the structure of conditional logistic regression (CLR) (Heinze and Puhr, 2010). The other possibility (may be easier) is based on generalized estimation equations (GEE). A simple approximation can be readily applied in practice by modifying standard GEE through the following two steps. First, get the leverage values (diagonal of hat-matrix) from a GEE analysis with independence working correlation; Then add half a leverage to each response before rerunning GEE based on a chosen working correlation matrix. Such procedure will not completely remove the firstorder term of the bias, but will adjust toward that direction. This approximation will guarantee finite estimates when separation occurs. Further investigation is, however, needed to test the robustness of the suggested methods to factors such as ascertainment and pedigree structures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recommended joint and meta-analysis strategies for case-control association testing of single low-count variants.

In genome-wide association studies of binary traits, investigators typically use logistic regression to test common variants for disease association within studies, and combine association results across studies using meta-analysis. For common variants, logistic regression tests are well calibrated, and meta-analysis of study-specific association results is only slightly less powerful than join...

متن کامل

General framework for meta-analysis of rare variants in sequencing association studies.

We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare varia...

متن کامل

Bias correction for the proportional odds logistic regression model with application to a study of surgical complications.

The proportional odds logistic regression model is widely used for relating an ordinal outcome to a set of covariates. When the number of outcome categories is relatively large, the sample size is relatively small, and/or certain outcome categories are rare, maximum likelihood can yield biased estimates of the regression parameters. Firth (1993) and Kosmidis and Firth (2009) proposed a procedur...

متن کامل

Social Status and Newspaper Readership

In this article, the authors explore the social bases of cultural consumption by examining the association between newspaper readership and social status. They report a strong and systematic association between status and newspaper readership which is consistent with the expected link between status in the classical Weberian sense, on the one hand, and cultural level and lifestyle, on the other...

متن کامل

Using the posterior distribution of deviance to measure evidence of association for rare susceptibility variants

Aitkin recently proposed an integrated Bayesian/likelihood approach that he claims is general and simple. We have applied this method, which does not rely on informative prior probabilities or large-sample results, to investigate the evidence of association between disease and the 16 variants in the KDR gene provided by Genetic Analysis Workshop 17. Based on the likelihood of logistic regressio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2014