Algorithms for subsequence combinatorics
نویسندگان
چکیده
A subsequence is obtained from a string by deleting any number of characters; thus in contrast to a substring, a subsequence is not necessarily a contiguous part of the string. Counting subsequences under various constraints has become relevant to biological sequence analysis, to machine learning, to the analysis of categorical time series in the social sciences, and to the theory of word complexity. We present theorems that lead to efficient dynamic programming algorithms to count (1) distinct subsequences in a string, (2) distinct common subsequences of two strings, (3) matching joint embeddings in two strings, (4) distinct subsequences with a given minimum span, and (5) sequences generated by a string allowing characters to come in runs of a length that is bounded from above.
منابع مشابه
Longest subsequences in permutations
For a class of permutations X the LXS problem is to identify in a given permutation σ of length n its longest subsequence that is isomorphic to a permutation of X . In general LXS is NP-hard. A general construction that produces polynomial time algorithms for many classes X is given. More efficient algorithms are given when X is defined by avoiding some set of permutations of length 3.
متن کاملSubsequence Combinatorics and Applications to Microarray Production, DNA Sequencing and Chaining Algorithms
We investigate combinatorial enumeration problems related to subsequences of strings; in contrast to substrings, subsequences need not be contiguous. For a finite alphabet Σ, the following three problems are solved. (1) Number of distinct subsequences: Given a sequence s ∈ Σ and a nonnegative integer k ≤ n, how many distinct subsequences of length k does s contain? A previous result by Chase st...
متن کاملThe Longest Filled Common Subsequence Problem
Inspired by a recent approach for genome reconstruction from incomplete data, we consider a variant of the longest common subsequence problem for the comparison of two sequences, one of which is incomplete, i.e. it has some missing elements. The new combinatorial problem, called Longest Filled Common Subsequence, given two sequences A and B, and a multisetM of symbols missing in B, asks for a s...
متن کاملMonotone Subsequences in High-Dimensional Permutations
This paper is part of the ongoing effort to study high-dimensional permutations. We prove the analogue to the Erdős–Szekeres Theorem: For every k ≥ 1, every order-n k-dimensional permutation contains a monotone subsequence of length Ωk (√ n ) , and this is tight. On the other hand, and unlike the classical case, the longest monotone subsequence in a random kdimensional permutation of order n is...
متن کاملOn a Speculated Relation Between Chvátal-Sankoff Constants of Several Sequences
It is well known that, when normalized by n, the expected length of a longest common subsequence of d sequences of length n over an alphabet of size σ converges to a constant γσ,d. We disprove a speculation by Steele regarding a possible relation between γ2,d and γ2,2. In order to do that we also obtain some new lower bounds for γσ,d, when both σ and d are small integers.
متن کاملRecent Progress in Algebraic Combinatorics
We survey three recent breakthroughs in algebraic combinatorics. The first is the proof by Knutson and Tao, and later Derksen and Weyman, of the saturation conjecture for Littlewood-Richardson coefficients. The second is the proof of the n! and (n + 1)n−1 conjectures by Haiman. The final breakthrough is the determination by Baik, Deift, and Johansson of the limiting behavior of the length of th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Theor. Comput. Sci.
دوره 409 شماره
صفحات -
تاریخ انتشار 2008