cyvcf2: fast, flexible variant analysis with Python
نویسندگان
چکیده
Motivation Variant call format (VCF) files document the genetic variation observed after DNA sequencing, alignment and variant calling of a sample cohort. Given the complexity of the VCF format as well as the diverse variant annotations and genotype metadata, there is a need for fast, flexible methods enabling intuitive analysis of the variant data within VCF and BCF files. Results We introduce cyvcf2 , a Python library and software package for fast parsing and querying of VCF and BCF files and illustrate its speed, simplicity and utility. Contact [email protected] or [email protected]. Availability and Implementation cyvcf2 is available from https://github.com/brentp/cyvcf2 under the MIT license and from common python package managers. Detailed documentation is available at http://brentp.github.io/cyvcf2/.
منابع مشابه
gSearch: a fast and flexible general search tool for whole-genome sequencing
BACKGROUND Various processes such as annotation and filtering of variants or comparison of variants in different genomes are required in whole-genome or exome analysis pipelines. However, processing different databases and searching among millions of genomic loci is not trivial. RESULTS gSearch compares sequence variants in the Genome Variation Format (GVF) or Variant Call Format (VCF) with a...
متن کاملFast Linear Transformations in Python
This paper introduces a new free library for the Python programming language, which provides a collection of structured linear transforms, that are not represented as explicit two dimensional arrays but in a more efficient way by exploiting the structural knowledge. This allows fast and memory savy forward and backward transformations while also provding a clean but still flexible interface to ...
متن کاملpypet: A Python Toolkit for Data Management of Parameter Explorations
pypet (Python parameter exploration toolkit) is a new multi-platform Python toolkit for managing numerical simulations. Sampling the space of model parameters is a key aspect of simulations and numerical experiments. pypet is designed to allow easy and arbitrary sampling of trajectories through a parameter space beyond simple grid searches. pypet collects and stores both simulation parameters a...
متن کاملMDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories.
As molecular dynamics (MD) simulations continue to evolve into powerful computational tools for studying complex biomolecular systems, the necessity of flexible and easy-to-use software tools for the analysis of these simulations is growing. We have developed MDTraj, a modern, lightweight, and fast software package for analyzing MD simulations. MDTraj reads and writes trajectory data in a wide ...
متن کاملRoBO: A Flexible and Robust Bayesian Optimization Framework in Python
Bayesian optimization is a powerful approach for the global derivative-free optimization of non-convex expensive functions. Even though there is a rich literature on Bayesian optimization, the source code of advanced methods is rarely available, making it difficult for practitioners to use them and for researchers to compare to and extend them. The BSD-licensed python package ROBO, released wit...
متن کامل