On the Linear Number of Matching Substrings
نویسنده
چکیده
We study the number of matching substrings in the pattern matching problem. In general, there can be a quadratic number of matching substrings in the size of a given text. The linearizing restriction enables to find at most a linear number of matching substrings. We first explore two well-known linearizing restriction rules, the longest-match rule and the shortest-match substring search rule, and show that both rules give the same result when a pattern is an infix-free set even though they have different semantics. Then, we introduce a new linearizing restriction, the leftmost nonoverlapping match rule that is suitable for find-and-replace operations in text searching, and propose an efficient algorithm for the new rule when a pattern is described by a regular expression. We also examine the problem of obtaining the maximal number of non-overlapping matching substrings.
منابع مشابه
An Improved Prefix-Free Regular-Expression Matching
We revisit the regular-expression matching problem with respect to prefix-freeness of the pattern. It is known that a prefix-free pattern gives only a linear number of matching substrings in the size of an input text. We improve the previous algorithm and suggest an efficient algorithm that finds all pairs (start, end) of start and end positions of all matching substrings with a single scan of ...
متن کاملOn a Parallel-Algorithms Method for String Matching Problems
Suux trees are the main data-structure in string matching algorithmics. There are several serial algorithms for suux tree construction which run in linear time, but the number of operations in the only parallel algorithm available, due to Apostolico, Iliopoulos, Landau, Schieber and Vishkin, is proportional to n log n. The algorithm is based on labeling substrings, similar to a classical serial...
متن کاملA New Linearizing Restriction in the Pattern Matching Problem
In the pattern matching problem, there can be a quadratic number of matching substrings in the size of a given text. The linearizing restriction finds, at most, a linear number of matching substrings. We first explore two well-known linearizing restriction rules, the longestmatch rule and the shortest-match substring search rule, and show that both rules give the same result when a pattern is a...
متن کاملEfficient algorithms for the longest common subsequence in $k$-length substrings
Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols from two sequences A and B are replaced with matching non-overlapping substrings of length k from A and B. We propose several algorithms for LCSk, being non-trivial incarnations of...
متن کاملPrefix-Free Regular-Expression Matching
We explore the regular-expression matching problem with respect to prefix-freeness of the pattern. We show that the prefix-free regular expression gives only linear number of matching substrings in the size of a given text. Based on this observation, we propose an efficient algorithm for the prefix-free regular-expression matching problem. Furthermore, we suggest an algorithm to determine wheth...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. UCS
دوره 16 شماره
صفحات -
تاریخ انتشار 2010