Automated deconvolution of structured mixtures from heterogeneous tumor genomic data
نویسندگان
چکیده
With increasing appreciation for the extent and importance of intratumor heterogeneity, much attention in cancer research has focused on profiling heterogeneity on a single patient level. Although true single-cell genomic technologies are rapidly improving, they remain too noisy and costly at present for population-level studies. Bulk sequencing remains the standard for population-scale tumor genomics, creating a need for computational tools to separate contributions of multiple tumor clones and assorted stromal and infiltrating cell populations to pooled genomic data. All such methods are limited to coarse approximations of only a few cell subpopulations, however. In prior work, we demonstrated the feasibility of improving cell type deconvolution by taking advantage of substructure in genomic mixtures via a strategy called simplicial complex unmixing. We improve on past work by introducing enhancements to automate learning of substructured genomic mixtures, with specific emphasis on genome-wide copy number variation (CNV) data, as well as the ability to process quantitative RNA expression data, and heterogeneous combinations of RNA and CNV data. We introduce methods for dimensionality estimation to better decompose mixture model substructure; fuzzy clustering to better identify substructure in sparse, noisy data; and automated model inference methods for other key model parameters. We further demonstrate their effectiveness in identifying mixture substructure in true breast cancer CNV data from the Cancer Genome Atlas (TCGA). Source code is available at https://github.com/tedroman/WSCUnmix.
منابع مشابه
An assessment of computational methods for estimating purity and clonality using genomic data derived from heterogeneous tumor tissue samples
Solid tumor samples typically contain multiple distinct clonal populations of cancer cells, and also stromal and immune cell contamination. A majority of the cancer genomics and transcriptomics studies do not explicitly consider genetic heterogeneity and impurity, and draw inferences based on mixed populations of cells. Deconvolution of genomic data from heterogeneous samples provides a powerfu...
متن کاملComputational de novo discovery of distinguishing genes for biological processes and cell types in complex tissues
Bulk tissue samples examined by gene expression studies are usually heterogeneous. The data gained from these samples display the confounding patterns of mixtures consisting of multiple cell types or similar cell types in various functional states, which hinders the elucidation of the molecular mechanisms underlying complex biological phenomena. A realistic approach to compensate for the limita...
متن کاملCollision-energy resolved ion mobility characterization of isomeric mixtures.
Existing instrumental resolving power limitations in ion mobility spectrometry (IMS) often restrict adequate characterization of unresolved or co-eluting chemical isomers. Recently, we introduced a novel chemometric deconvolution approach that utilized post-IM collision-induced dissociation (CID) mass spectrometry (MS) data to extract "pure" IM profiles and construct CID mass spectra of individ...
متن کاملAnalysis of gin essential oil mixtures by multidimensional and one-dimensional gas chromatography/mass spectrometry with spectral deconvolution.
The composition of essential oils and their mixtures used to formulate gin is usually too complex to separate all sample components by standard capillary gas chromatography (GC). In particular, minor constituents that possess important organoleptic properties can be masked by co-elution with major sample components. A solution is provided that combines gas chromatography/mass spectrometry (GC/M...
متن کاملAn Automated Spectral Deconvolution Algorithm: Application to Thermal Infrared Studies of Earth and Mars
Introduction: The linear mixing of thermal infrared (TIR) emission spectra in multi-mineralic mixtures has been proven, and its limits and applicability have been quantitatively investigated [1,2]. Limiting factors in the accuracy of any linear retrieval (spectral deconvolution) algorithm include the spectral precision of the instrumentation as well as the fact that the number of end-members mu...
متن کامل