Title : APPROXIMATION ALGORITHMS FOR POINT PATTERN MATCHING AND SEARCHING
نویسندگان
چکیده
Title : APPROXIMATION ALGORITHMS FOR POINT PATTERN MATCHING AND SEARCHING Minkyoung Cho, Doctor of Philosophy, 2010 Directed by: Professor David M. Mount Department of Computer Science Point pattern matching is a fundamental problem in computational geometry. For given a reference set and pattern set, the problem is to find a geometric transformation applied to the pattern set that minimizes some given distance measure with respect to the reference set. This problem has been heavily researched under various distance measures and error models. Point set similarity searching is variation of this problem in which a large database of point sets is given, and the task is to preprocess this database into a data structure so that, given a query point set, it is possible to rapidly find the nearest point set among elements of the database. Here, the term nearest is understood in above sense of pattern matching, where the elements of the database may be transformed to match the given query set. The approach presented here is to compute a low distortion embedding of the pattern matching problem into an (ideally) low dimensional metric space and then apply any standard algorithm for nearest neighbor searching over this metric space. This main focus of this dissertation is on two problems in the area of point pattern matching and searching algorithms: (1) improving the accuracy of alignmentbased point pattern matching and (2) computing low-distortion embeddings of point sets into vector spaces. For the first problem, new methods are presented for matching point sets based on alignments of small subsets of points. It is shown that these methods lead to better approximation bounds for alignment-based planar point pattern matching algorithms under the Hausdorff distance. Furthermore, it is shown that these approximation bounds are nearly the best achievable by alignment-based methods. For the second problem, results are presented for two different distance measures. First, point pattern similarity search under translation for point sets in multidimensional integer space is considered, where the distance function is the symmetric difference. A randomized embedding into real space under the L1 metric is given. The algorithm achieves an expected distortion of O(log n). Second, an algorithm is given for embedding R under the Earth Mover’s Distance (EMD) into multidimensional integer space under the symmetric difference distance. This embedding achieves a distortion of O(log∆), where ∆ is the diameter of the point set. Combining this with the above result implies that point pattern similarity search with translation under the EMD can be embedded into real space in the L1 metric with an expected distortion of O(log n log∆). APPROXIMATION ALGORITHMS
منابع مشابه
Practical Methods for Approximate String Matching
Given a pattern string and a text, the task of approximate string matching is to find all locations in the text that are similar to the pattern. This type of search may be done for example in applications of spelling error correction or bioinformatics. Typically edit distance is used as the measure of similarity (or distance) between two strings. In this thesis we concentrate on unit-cost edit ...
متن کاملSequential and indexed two-dimensional combinatorial template matching allowing rotations
We present new and faster algorithms to search for a 2-dimensional pattern in a 2-dimensional text allowing any rotation of the pattern. This has applications such as image databases and computational biology. We consider the cases of exact and approximate matching under several matching models, using a combinatorial approach that generalizes string matching techniques. We focus on sequential a...
متن کاملOn-line Approximate String Matching in Natural Language
We consider approximate pattern matching in natural language text. We use the words of the text as the alphabet, instead of the characters as in traditional string matching approaches. Hence our pattern consists of a sequence of words. From the algorithmic point of view this has several advantages: (i) the number of words is much less than the number of characters, which in effect means shorter...
متن کاملAverage-optimal string matching
The exact string matching problem is to find the occurrences of a pattern of length m from a text of length n symbols. We develop a novel and unorthodox filtering technique for this problem. Our method is based on transforming the problem into multiple matching of carefully chosen pattern subsequences. While this is seemingly more difficult than the original problem, we show that the idea leads...
متن کاملExact and Approximate Two Dimensional Pattern Matching allowing Rotations
We give fast ltering algorithms for searching a 2{dimensional pattern in a 2{dimensional text allowing any rotation of the pattern. We consider the cases of exact and approximate matching under several matching models, improving the previous results. For a text of size n n character and a pattern of size m m characters, the exact matching takes average time O(n 2 =m). If we allow k{mismatches o...
متن کامل