A Secure Protocol for Computing String Distance Metrics
نویسندگان
چکیده
An important problem is that of finding matching pairs of records from heterogeneous databases, while maintaining privacy of the database parties. As we have shown in earlier work, distance metrics are a useful tool for record-linkage in many domains, and thus secure computation of distance metrics is quite important for secure record-linkage. In this paper, we consider the computation of a number of distance metrics in a secure multiparty setting. Towards this goal, we propose a stochastic scalar product protocol that is provably consistent, and is also as secure as an underlying set-intersection cryptographic protocol. We then use our stochastic dot product protocol to perform secure computation of some standard distance metrics like TFIDF, SoftTFIDF and the Euclidean Distance Metric. Not only are they asymptotically consistent, but experiments show that the stochastic estimates are also quite close to the true values after just 1000 samples. These secure distance computations can then be used to perform secure matching of records.
منابع مشابه
Provably secure and efficient identity-based key agreement protocol for independent PKGs using ECC
Key agreement protocols are essential for secure communications in open and distributed environments. Recently, identity-based key agreement protocols have been increasingly researched because of the simplicity of public key management. The basic idea behind an identity-based cryptosystem is that a public key is the identity (an arbitrary string) of a user, and the corresponding private key is ...
متن کاملSecure Routing Protocol: Affection on MANETs Performance
In mobile ad hoc networks, the absence ofinfrastructure and the consequent absence of authorizationfacilities impede the usual practice of establishing a practicalcriterion to distinguishing nodes as trusted and distrusted.Since all nodes in the MANETs would be used as router inmulti-hop applications, secure routing protocols have vital rulein the security of the network. So evaluating the perf...
متن کاملPrivacy-Preserving Protocols for of Edit Distance and Other Dynamic Programming Algorithms
The edit distance between two strings is the minimum number of delete, insert, and replace operations needed to convert one string into another. Computational biology tasks such as comparing genome sequences of two individuals rely heavily on the dynamic programming algorithm for computing edit distances as well as the algorithms for related string-alignment problems. A genome sequence may reve...
متن کاملEfficient Privacy-Preserving General Edit Distance and Beyond
Edit distance is an important non-linear metric that has many applications ranging from matching patient genomes to text-based intrusion detection. Depends on the application, related string-comparison metrics, such as weighted edit distance, Needleman-Wunsch distance, longest common subsequences, and heaviest common subsequences, can usually fit better than the basic edit distance. When these ...
متن کاملA Comparison of String Distance Metrics for Name-Matching Tasks
Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names. We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators , token-based distance metrics, and hybrid methods. Overall, the best-performing method is a hybrid s...
متن کامل