A Primer on High-Throughput Computing for Genomic Selection
نویسندگان
چکیده
High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin-Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans.
منابع مشابه
Improved algorithms for multiplex PCR primer set selection with amplification length constraints
Numerous high-throughput genomics assays require the amplification of a large number of genomic loci of interest. Amplification is cost-effectively achieved using several short single-stranded DNA sequences called primers and polymerase enzyme in a reaction called multiplex polymerase chain reaction (MP-PCR). Amplification of each locus requires that two of the primers bind to the forward and r...
متن کاملApproximation Algorithms for Minimum PCR Primer Set Selection with Amplification Length and Uniqueness Constraints
A critical problem in the emerging high-throughput genotyping protocols is to minimize the number of polymerase chain reaction (PCR) primers required to amplify the single nucleotide polymorphism loci of interest. In this paper we study PCR primer set selection with amplification length and uniqueness constraints from both theoretical and practical perspectives. We give a greedy algorithm that ...
متن کاملSNPbox: a modular software package for large-scale primer design
UNLABELLED We developed a modular software package SNPbox that automates and standardizes the generation of PCR primers and is used in the strategy for constructing single nucleotide polymorphisms (SNPs) maps. In this strategy, the focus of primer design can be either on the validation of annotated public SNPs or on the SNP discovery in exon regions or extended genomic regions, both by resequen...
متن کاملApplication of DNA Molecular Markers in Plant Breeding (Review article)
Plant Breeding has utilized a wide range of techniques and methods to improve the quality and quantity of plants. The molecular markers are the tools that have provided a new perspective for plant breeding advancements. This article has reviewed the various advantages and uses of molecular markers and the utilization of the high potential of natural polymorphisms within communities, combined wi...
متن کاملDimensionality Reduction in Genomics and Proteomics
Finding reliable, meaningful patterns in data with high numbers of attributes can be extremely difficult. Feature selection helps us to decide what attributes or combination of attributes are most important for finding these patterns. In this chapter, we study feature selection methods for building classification models from high-throughput genomic (microarray) and proteomic (mass spectrometry)...
متن کامل