A note on the longest common substring with $k$-mismatches problem
نویسنده
چکیده
The recently introduced longest common substring with kmismatches (k-LCF) problem is to find, given two sequences S1 and S2 of length n each, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is at most k. So far, the only subquadratic time result for this problem was known for k = 1 [6]. We first present two output-dependent algorithms solving the k-LCF problem and show that for k = O(log n), where ε > 0, at least one of them works in subquadratic time, using O(n) words of space. The choice of one of these two algorithms to be applied for a given input can be done after linear time and space preprocessing. Finally we present a tabulation-based algorithm working, in its range of applicability, in O(n logmin(k+l0, σ)/ log n) time, where l0 is the length of the standard longest common substring.
منابع مشابه
Longest Common Substring with Approximately k Mismatches
In the longest common substring problem we are given two strings of length n and must find a substring of maximal length that occurs in both strings. It is well-known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one letter. To circumvent this, Leimeister and Morgenstern introduced the problem of the...
متن کاملar X iv : 1 40 9 . 16 94 v 2 [ cs . D S ] 1 6 M ar 2 01 5 Longest common substrings with k mismatches
The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤ k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the lengths of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics...
متن کاملLongest common substrings with k mismatches
The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤ k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the length of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics ...
متن کاملkmacs: the k-mismatch average common substring approach to alignment-free sequence comparison
MOTIVATION Alignment-based methods for sequence analysis have various limitations if large datasets are to be analysed. Therefore, alignment-free approaches have become popular in recent years. One of the best known alignment-free methods is the average common substring approach that defines a distance measure on sequences based on the average length of longest common words between them. Herein...
متن کاملkmacs: the k-Mismatch Avera- ge Common Substring Approach for Phylogeny Reconstruction
The vast majority of sequence comparison methods for phylogeny reconstruction rely on pairwise or multiple sequence alignments. These approaches are in practice not usable for longer sequences such as complete genomes. For this reason alignment-free methods have recently become more popular because they are much faster and usually computable in linear time. Some of these methods are based on re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Process. Lett.
دوره 115 شماره
صفحات -
تاریخ انتشار 2015