cyvcf2: fast, flexible variant analysis with Python

نویسندگان

  • Brent S. Pedersen
  • Aaron R. Quinlan
چکیده

Motivation Variant call format (VCF) files document the genetic variation observed after DNA sequencing, alignment and variant calling of a sample cohort. Given the complexity of the VCF format as well as the diverse variant annotations and genotype metadata, there is a need for fast, flexible methods enabling intuitive analysis of the variant data within VCF and BCF files. Results We introduce cyvcf2 , a Python library and software package for fast parsing and querying of VCF and BCF files and illustrate its speed, simplicity and utility. Contact [email protected] or [email protected]. Availability and Implementation cyvcf2 is available from https://github.com/brentp/cyvcf2 under the MIT license and from common python package managers. Detailed documentation is available at http://brentp.github.io/cyvcf2/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

gSearch: a fast and flexible general search tool for whole-genome sequencing

BACKGROUND Various processes such as annotation and filtering of variants or comparison of variants in different genomes are required in whole-genome or exome analysis pipelines. However, processing different databases and searching among millions of genomic loci is not trivial. RESULTS gSearch compares sequence variants in the Genome Variation Format (GVF) or Variant Call Format (VCF) with a...

متن کامل

Fast Linear Transformations in Python

This paper introduces a new free library for the Python programming language, which provides a collection of structured linear transforms, that are not represented as explicit two dimensional arrays but in a more efficient way by exploiting the structural knowledge. This allows fast and memory savy forward and backward transformations while also provding a clean but still flexible interface to ...

متن کامل

pypet: A Python Toolkit for Data Management of Parameter Explorations

pypet (Python parameter exploration toolkit) is a new multi-platform Python toolkit for managing numerical simulations. Sampling the space of model parameters is a key aspect of simulations and numerical experiments. pypet is designed to allow easy and arbitrary sampling of trajectories through a parameter space beyond simple grid searches. pypet collects and stores both simulation parameters a...

متن کامل

MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories.

As molecular dynamics (MD) simulations continue to evolve into powerful computational tools for studying complex biomolecular systems, the necessity of flexible and easy-to-use software tools for the analysis of these simulations is growing. We have developed MDTraj, a modern, lightweight, and fast software package for analyzing MD simulations. MDTraj reads and writes trajectory data in a wide ...

متن کامل

RoBO: A Flexible and Robust Bayesian Optimization Framework in Python

Bayesian optimization is a powerful approach for the global derivative-free optimization of non-convex expensive functions. Even though there is a rich literature on Bayesian optimization, the source code of advanced methods is rarely available, making it difficult for practitioners to use them and for researchers to compare to and extend them. The BSD-licensed python package ROBO, released wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2017