Finding regions of aberrant DNA copy number associated with tumor phenotype
نویسنده
چکیده
DNA copy number alterations are a hallmark of cancer. Understanding their role in tumor progression can help improve diagnosis, prognosis and therapy selection for cancer patients and can contribute to the development of personalised therapies. High-resolution, genome-wide measurements of DNA copy number changes for large cohorts of tumors are currently available, owing to the rapid development of technologies like microarray-based array comparative hibridization (arrayCGH). In this manuscript, we introduce a computational pipeline for statistical analysis of tumor cohorts, which can help extract relevant patterns of copy number aberrations and infer their association with various phenotypical indicators. The pipeline makes use of machine learning techniques for classification and feature selection, with emphasis on interpretable models (linear models with penalties, tree-based models). The main challenges that our methods meet are the high dimensionality of the arrays compared to the small number of tumor samples available, as well as the large correlations between copy number estimates measured at neighboring genomic locations. Consequently, feature selection is unstable, depending strongly on the set of training samples, leading to un-reproducible signatures across different clinical studies. We also show that the feature ranking given by several widely-used methods for feature selection is biased due to the large correlations between features. In order to correct for the bias and instability of the feature ranking, we introduce a dimension reduction step in our pipeline, consisting of multivariate segmentation of the set of arrays. We present three algorithms for multivariate segmentation, which are based on indentifying recurrent DNA breakpoints or DNA regions of constant copy number profile. The multivariate segmentation constitutes the basis for computing a smaller set of super-features, by summarizing the DNA copy number within the segmentation regions. Using the super-features for supervised classification, we improve the interpretability and stability of the models, where the baseline for comparison consists of classification models trained on probe data. We validated the methods by training models for prediction of the phenotype of breast cancers and neuroblastoma tumors. We show that the multivariate segmentation step affords higher model stability and it does not decrease the accuracy of the prediction. We obtain substantial dimension reduction (up to 200-fold less predictors), which recommends the multivariate segmentation procedures not only for the purpose of phenotype prediction, but also as preprocessing step for downstream integration with other data types. The interpretability of the models is also improved, revealing important associations between copy number aberrations and phenotype. For example, we show that a very informative predictor that distinguishes between inflammatory and non-inflammatory breast cancers with ERBB2 amplification is the co-amplification of the genomic region located in the immediate vicinity of the ERBB2 gene locus. Therefore, we conclude that the size of the amplicon is associated with the cancer subtype, a hypothesis present elsewhere in the literature. In the case of neuroblastoma tumors, we show that patients belonging to different age subgroups are characterized by distinct copy number patterns, especially when the subgroups are defined as older or younger than 16-18 months. Indeed, considering a large set of age cutoffs, our prediction models are most accurate if the cutoff is around 16-18 months. We thereby confirm the recommendation for a higher age cutoff than 12 months
منابع مشابه
Assessment of mitochondrial DNA copy number in peripheral blood leukocyte of opiate abusers and healthy individuals
Background: Based on the studies, variation in the mitochondrial DNA (mtDNA) copy number in peripheral blood leukocytes is associated with increased susceptibility to diseases including cancer. Opiate abusers are at high risk for diseases. In this study, we measured the mtDNA copy number in peripheral blood leukocytes in a group of opiate abusers compared with those in healthy individuals. Met...
متن کاملMethylated circulating tumor DNA in blood: power in cancer prognosis and response.
Circulating tumor DNA (ctDNA) in the plasma or serum of cancer patients provides an opportunity for non-invasive sampling of tumor DNA. This 'liquid biopsy' allows for interrogations of DNA such as quantity, chromosomal alterations, sequence mutations and epigenetic changes, and can be used to guide and improve treatment throughout the course of the disease. This tremendous potential for real-t...
متن کاملMicroduplication of Xp22.31 and MECP2 Pathogenic Variant in a Girl with Rett Syndrome: A Case Report
Rett syndrome (RS) is a neurodevelopmental infantile disease characterized by an early normal psychomotor development followed by a regression in the acquisition of normal developmental stages. In the majority of cases, it leads to a sporadic mutation in the MECP2 gene, which is located on the X chromosome. However, this syndrome has also been associated with microdeletions, gene translocations...
متن کاملRelatively Small Contribution of Methylation and Genomic Copy Number Aberration to the Aberrant Expression of Inflammation-Related Genes in HBV-Related Hepatocellular Carcinoma
BACKGROUND It is well known that chronic inflammation plays a pivotal role in the development of hepatitis B virus (HBV) related hepatocellular carcinoma (HCC). However, the causes behind aberrant expression of inflammation-related genes occurred in HCC remain unclear. METHODS We performed array-based analyses to comprehensively investigate the contributions of DNA methylation and somatic cop...
متن کاملPromoter hypermethylation of KLOTHO; an anti-senescence related gene in colorectal cancer patients of Kashmir valley
Hypermethylation of CpG islands located in the promoter regions of genes is a major event in the development of the majority of cancer types, due to the subsequent aberrant silencing of important tumor suppressor genes. KLOTHO; a novel gene associated primarily with suppressing senescence has been shown to contribute to tumorigenesis as a result of its impaired function. Recently the relevance ...
متن کامل