Privacy-Enhanced Methods for Comparing Compressed DNA Sequences

نویسندگان

  • David Eppstein
  • Michael T. Goodrich
  • Pierre Baldi
چکیده

In this paper, we study methods for improving the efficiency and privacy of compressed DNA sequence comparison computations, under various querying scenarios. For instance, one scenario involves a querier, Bob, who wants to test if his DNA string, Q, is close to a DNA string, Y , owned by a data owner, Alice, but Bob does not want to reveal Q to Alice and Alice is willing to reveal Y to Bob only if it is close to Q. We describe a privacy-enhanced method for comparing two compressed DNA sequences, which can be used to achieve the goals of such a scenario. Our method involves a reduction to set differencing, and we describe a privacy-enhanced protocol for set differencing that achieves absolute privacy for Bob (in the information theoretic sense), and a quantifiable degree of privacy protection for Alice. One of the important features of our protocols, which makes them ideally suited to privacy-enhanced DNA sequence comparison problems, is that the communication complexity of our solutions is proportional to a threshold that bounds the cardinality of the set differences that are of interest, rather than the cardinality of the sets involved (which correlates to the length of the DNA sequences). Moreover, in our protocols, the querier, Bob, can easily compute the set difference only if its cardinality is close to or below a specified threshold.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sbvrldnacomp:an Effective Dna Sequence Compression Algorithm

There are plenty specific types of data which are needed to compress for easy storage and to reduce overall retrieval times. Moreover, compressed sequence can be used to understand similarities between biological sequences. DNA data compression challenge has become a major task for many researchers for the last few years as a result of exponential increase of produced sequences in gene database...

متن کامل

Practical aspects of Compressed Suffix Arrays and FM-Index in Searching DNA Sequences

Searching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compress...

متن کامل

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

Protecting Genomic Privacy by a Sequence-Similarity Based Obfuscation Method

In the post-genomic era, large-scale personal DNA sequences are produced and collected for genetic medical diagnoses and new drug discovery, which, however, simultaneously poses serious challenges to the protection of personal genomic privacy. Existing genomic privacy-protection methods are either time-consuming or with low accuracy. To tackle these problems, this paper proposes a sequence simi...

متن کامل

Alignment by numbers: sequence assembly using compressed numerical representations

Motivation: DNA sequencing instruments are enabling genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and interpret sequence data. Established methods for computational sequence analysis generally use nucleotide-level resolution of sequences, and while such approaches can be very accurate, increasingly ambitious and data-intensive analyses are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1107.3593  شماره 

صفحات  -

تاریخ انتشار 2011