Toward a Practical Data Privacy Scheme for a Distributed Implementation of the Smith-Waterman Genome Sequence Comparison Algorithm
نویسندگان
چکیده
Volunteer distributed computations utilize spare processor cycles of personal computers that are connected to the Internet. The resulting platforms provide computational power previously available only through the use of expensive clusters or supercomputers. However, distributed computations running in untrustworthy environments raise a number of security concerns, including computation integrity and data privacy. This paper introduces a strategy for enhancing data privacy in some distributed volunteer computations, providing an important first step toward a general data privacy solution for these computations. The strategy is used to provide enhanced data privacy for the Smith-Waterman local nucleotide sequence comparison algorithm. Our modified Smith-Waterman algorithm provides reasonable performance, identifying most, and in many cases all, sequence pairs that exhibit statistically significant similarity according to the unmodified algorithm, with reasonable levels of false positives. Moreover the modified algorithm achieves a net decrease in execution time, with no increase in memory requirements. Most importantly, our scheme represents an important first step toward providing data privacy for a practical and important real-world algorithm.
منابع مشابه
Whole Genome Comparison using Commodity Workstations
—Whole genome comparison consists of comparing or aligning two genome sequences in the hope that analogous functional or physical characteristics may be observed. Sequence comparison is done via a number of slow rigorous algorithms, or faster heuristic approaches. However, due to the large size of genomic sequences, the capacity of current software is limited. In this work, we design a parallel...
متن کاملAligning Sequences with Non-Affine Gap Penalty: PLAINS Algorithm, a Practical Implementation, and its Biological Applications in Comparative Genomics
In this paper, we consider PLAINS, an algorithm that provides efficient alignment over DNA sequences using piecewise-linear gap penalties that closely approximate more general and meaningful gap-functions. The innovations of PLAINS are fourfold. First, when the number of parts to a piecewise-linear gap function is fixed, PLAINS uses linear space in the worst case, and obtains an alignment that ...
متن کاملMatching Genetic Sequences in Distributed Adaptive Computing Systems
Distributed adaptive computing systems (ACS) allow developers to design applications using multiple programmable devices. The ACS API, an API created for distributed adaptive computing, gives developers the ability to design scalable ACS systems in a cluster networking environment for large applications. One such application, found in the field of bioinformatics, is the DNA sequence alignment p...
متن کاملSeparating indexes from data: a distributed scheme for secure database outsourcing
Database outsourcing is an idea to eliminate the burden of database management from organizations. Since data is a critical asset of organizations, preserving its privacy from outside adversary and untrusted server should be warranted. In this paper, we present a distributed scheme based on storing shares of data on different servers and separating indexes from data on a distinct server. Shamir...
متن کاملAcceleration of the Smith-Waterman algorithm using single and multiple graphics processors
Finding regions of similarity between two very long data streams is a computationally intensive problem referred to as sequence alignment. Alignment algorithms must allow for imperfect sequence matching with different starting locations and some gaps and errors between the two data sequences. Perhaps the most well known application of sequence matching is the testing of DNA or protein sequences...
متن کامل