Genome analysis LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor
نویسندگان
چکیده
Summary: Genomic datasets are often interpreted in the context of large-scale reference databases. One approach is to identify significantly overlapping gene sets, which works well for gene-centric data. However, many types of high-throughput data are based on genomic regions. Locus Overlap Analysis (LOLA) provides easy and automatable enrichment analysis for genomic region sets, thus facilitating the interpretation of functional genomics and epigenomics data. Availability and Implementation: R package available in Bioconductor and on the following website: http://lola.computational-epigenetics.org. Contact: [email protected] or [email protected] Many types of biological data can be interpreted by comparing them to reference databases and searching for interesting patterns of enrichment and depletion. A particularly successful approach focuses on identifying significant overlap between gene sets. To this end, a gene set of interest is compared with a large compendium of existing gene sets with biological annotations, and the observed patterns of overlap are used for interpreting the new gene set. This type of analysis is exemplified by the popular GSEA tool (Subramanian et al., 2005), and it relies on existing gene set annotation databases such as Gene Ontology, KEGG Pathways and MSigDB. Although gene set analysis has been pivotal for making connections between diverse types of genomic data, this method suffers from one major limitation: it requires gene-centric data. This is becoming increasingly limiting as our understanding of gene regulation advances. Genes are no longer viewed as monolithic building blocks but as multifaceted elements with alternative splicing and alternative promoters, as well as various types of non-coding, antisense and regulatory transcripts. Furthermore, it has become evident that gene expression and chromatin organization are controlled by 100000s of enhancers and other functional elements, which are often difficult to map to gene symbols. The increasing emphasis on genomic region sets has been propelled by next generation sequencing—a technology that produces data most naturally analyzed in the context of genomic regions, for example as peaks and segmentations. Driven by projects such as ENCODE (Encyclopedia of DNA Elements) and IHEC (International Human Epigenome Consortium), the research community has established large catalogs of regulatory elements and other genomic features across many cell types. Here, we present an R/Bioconductor package called LOLA (Locus Overlap Analysis) for enrichment analysis based on genomic regions. LOLA builds upon analytical concepts that we developed and applied in previous work (Bock et al., 2012; Farlik et al., 2015; Tomazou et al., 2015), and our software makes genomic region set analysis fast and easy for any species with an annotated reference genome. LOLA complements existing tools for gene set analysis (Khatri et al., 2012), tools that convert gene sets into genomic loci such as GREAT (McLean et al., 2010) and the ChIP-Seq Significance Tool (Auerbach et al., 2013), and other related tools including GenometriCorr (Favorov et al., 2012), Genomic HyperBrowser (Sandve et al., 2013), EpiGRAPH (Bock et al., 2009), genomation (Akalin et al., 2014), i-CisTarget (Imrichova et al., 2015), Genome Track Analyzer (Kravatsky et al., 2015), VC The Author 2015. Published by Oxford University Press. 587 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Bioinformatics, 32(4), 2016, 587–589 doi: 10.1093/bioinformatics/btv612 Advance Access Publication Date: 27 October 2015
منابع مشابه
LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor
UNLABELLED Genomic datasets are often interpreted in the context of large-scale reference databases. One approach is to identify significantly overlapping gene sets, which works well for gene-centric data. However, many types of high-throughput data are based on genomic regions. Locus Overlap Analysis (LOLA) provides easy and automatable enrichment analysis for genomic region sets, thus facilit...
متن کاملGenome-wide Association Study to Identify Genes and Biological Pathways Associated with Type Traits in Cattle using Pathway Analysis
Extended Abstract Introduction and Objective: Type traits describing the skeletal characteristics of an animal are moderately to strongly genetically correlate with other economically important traits in cattle including fertility, longevity and carcass traits. The present study aimed to conduct a genome wide association studies (GWAS) based on gene-set enrichment analysis for identifying the ...
متن کاملDChIPRep, an R/Bioconductor package for differential enrichment analysis in chromatin studies
The genome-wide study of epigenetic states requires the integrative analysis of histone modification ChIP-seq data. Here, we introduce an easy-to-use analytic framework to compare profiles of enrichment in histone modifications around classes of genomic elements, e.g. transcription start sites (TSS). Our framework is available via the user-friendly R/Bioconductor package DChIPRep. DChIPRep uses...
متن کاملScalable Genomics with R and Bioconductor
This paper reviews strategies for solving problems encountered when analyzing large genomic data sets and describes the implementation of those strategies in R by packages from the Bioconductor project. We treat the scalable processing, summarization and visualization of big genomic data. The general ideas are well established and include restrictive queries, compression, iteration and parallel...
متن کاملMEDIPS: genome-wide differential coverage analysis of sequencing data derived from DNA enrichment experiments
MOTIVATION DNA enrichment followed by sequencing is a versatile tool in molecular biology, with a wide variety of applications including genome-wide analysis of epigenetic marks and mechanisms. A common requirement of these diverse applications is a comparison of read coverage between experimental conditions. The amount of samples generated for such comparisons ranges from few replicates to hun...
متن کامل