From Sequencer to Supercomputer: An Automatic Pipeline for Managing and Processing Next Generation Sequencing Data

نویسندگان

Terry Camerlengo

Hatice Gulcin Ozer

Raghuram Onti-Srinivasan

Pearlly Yan

Tim Huang

Jeffrey Parvin

Kun Huang

چکیده

Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analyti...

متن کامل

Data processing on a large scale

Within the last decade the high costs and complexity of Next Generation Sequencing (NGS) data organization put pressure on NGS data centres to organize convenient IT service infrastructures for automatic data management, processing and analyses. Our market analysis showed that existing applications processing NGS data were insufficiently documented, not extensible or strongly dependent on the u...

متن کامل

A high performance computational environment for UHTS studies

This work regards the use of high performance computing (HPC) methods for a new bioinformatics challenge: the analysis of Terabyte-size data generated by the new ultra high throughput sequencing (UHTS) technology. As in microarray or mass spectrometry cases, public repositories are growing to store data from the next generation studies produced in laboratories around the world. These can be use...

متن کامل

Computational Challenges of Next Generation Sequencing Pipelines Using Heterogeneous Systems

We are rapidly entering the era of genomics. The dramatic cost reduction of DNA sequencing due to the introduction of Next Generation Sequencing (NGS) techniques has resulted in an exponential growth of genetics data. The amount of data generated, and its associated processing into useful information, poses serious computational challenges. Here, we give a brief introduction of NGS, show a typi...

متن کامل

Mobile Genome Express (MGE): A comprehensive automatic genetic analyses pipeline with a mobile device

The development of next-generation sequencing (NGS) technology allows to sequence whole exomes or genome. However, data analysis is still the biggest bottleneck for its wide implementation. Most laboratories still depend on manual procedures for data handling and analyses, which translates into a delay and decreased efficiency in the delivery of NGS results to doctors and patients. Thus, there ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2012 شماره

صفحات -

تاریخ انتشار 2012

From Sequencer to Supercomputer: An Automatic Pipeline for Managing and Processing Next Generation Sequencing Data

نویسندگان

چکیده

منابع مشابه

DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

Data processing on a large scale

A high performance computational environment for UHTS studies

Computational Challenges of Next Generation Sequencing Pipelines Using Heterogeneous Systems

Mobile Genome Express (MGE): A comprehensive automatic genetic analyses pipeline with a mobile device

عنوان ژورنال:

اشتراک گذاری