Genome-Wide Survival Analysis of Somatic Mutations in Cancer
نویسندگان
چکیده
Deriving clinical utility from large genomic datasets requires the identification of statistically significant associations between genomic measurements and a clinical phenotype. An important instance of this problem is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. The most widely used statistical test for comparing the survival time of two (or more) classes of samples is the nonparametric log-rank test. Nearly all implementations of the log-rank test rely on the asymptotic normality of the test statistic. However, this approximation gives poor results in many high-throughput genomics applications where: (i) the populations are unbalanced; i.e. the population containing a given variant is significantly smaller than the population without that variant; (ii) one tests many possible variants and is interested in those variants with very small p-values that remain significant after multi-hypothesis correction. Our contributions are: (1) We show empirically that the inaccuracy of the asymptotic approximation for the log-rank test results in a large number of false positives in cancer genomics applications due to unbalanced populations, and that among the two exact distributions (permutational and conditional) proposed in the literature, the permutational distribution provides a more accurate approximation of the true p-values. (2) We develop and analyze a novel fully polynomial time approximation scheme (FPTAS) for computing the p-value of the log-rank statistic for any range of population sizes under the exact permutational distribution. (3) We demonstrate the practicality and accuracy of our approach by testing the algorithm on somatic mutation data from The Cancer Genome Atlas (TCGA).
منابع مشابه
NetNorM: Capturing cancer-relevant information in somatic exome mutation data with gene networks for cancer stratification and prognosis
Genome-wide somatic mutation profiles of tumours can now be assessed efficiently and promise to move precision medicine forward. Statistical analysis of mutation profiles is however challenging due to the low frequency of most mutations, the varying mutation rates across tumours, and the presence of a majority of passenger events that hide the contribution of driver events. Here we propose a me...
متن کاملBinning Somatic Mutations Based on Biological Knowledge for Predicting Survival: An Application in Renal Cell Carcinoma
Enormous efforts of whole exome and genome sequencing from hundreds to thousands of patients have provided the landscape of somatic genomic alterations in many cancer types to distinguish between driver mutations and passenger mutations. Driver mutations show strong associations with cancer clinical outcomes such as survival. However, due to the heterogeneity of tumors, somatic mutation profile...
متن کاملA Review of Driver Genetic Alterations in Thyroid Cancers
Thyroid cancer is a frequent endocrine related malignancy with continuous increasing incidence. There has been moving development in understanding its molecular pathogenesis recently mainly through the explanation of the original role of several key signaling pathways and related molecular distributors. Central to these mechanisms are the genetic and epigenetic alterations in these pathways, su...
متن کاملTumor Mutation Burden Forecasts Outcome in Ovarian Cancer with BRCA1 or BRCA2 Mutations
BACKGROUND Increased number of single nucleotide substitutions is seen in breast and ovarian cancer genomes carrying disease-associated mutations in BRCA1 or BRCA2. The significance of these genome-wide mutations is unknown. We hypothesize genome-wide mutation burden mirrors deficiencies in DNA repair and is associated with treatment outcome in ovarian cancer. METHODS AND RESULTS The total nu...
متن کاملMitochondrial Variations in Non-Small Cell Lung Cancer (NSCLC) Survival
Mutations in the mtDNA genome have long been suspected to play an important role in cancer. Although most cancer cells harbor mtDNA mutations, the question of whether such mutations are associated with clinical prognosis of lung cancer remains unclear. We resequenced the entire mitochondrial genomes of tumor tissue from a population of 250 Korean patients with non-small cell lung cancer (NSCLC)...
متن کاملAccurate Computation of Survival Statistics in Genome-Wide Studies
A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a...
متن کامل