Algorithms for subsequence combinatorics

نویسندگان

  • Cees H. Elzinga
  • Sven Rahmann
  • Hui Wang
چکیده

A subsequence is obtained from a string by deleting any number of characters; thus in contrast to a substring, a subsequence is not necessarily a contiguous part of the string. Counting subsequences under various constraints has become relevant to biological sequence analysis, to machine learning, to the analysis of categorical time series in the social sciences, and to the theory of word complexity. We present theorems that lead to efficient dynamic programming algorithms to count (1) distinct subsequences in a string, (2) distinct common subsequences of two strings, (3) matching joint embeddings in two strings, (4) distinct subsequences with a given minimum span, and (5) sequences generated by a string allowing characters to come in runs of a length that is bounded from above.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Longest subsequences in permutations

For a class of permutations X the LXS problem is to identify in a given permutation σ of length n its longest subsequence that is isomorphic to a permutation of X . In general LXS is NP-hard. A general construction that produces polynomial time algorithms for many classes X is given. More efficient algorithms are given when X is defined by avoiding some set of permutations of length 3.

متن کامل

Subsequence Combinatorics and Applications to Microarray Production, DNA Sequencing and Chaining Algorithms

We investigate combinatorial enumeration problems related to subsequences of strings; in contrast to substrings, subsequences need not be contiguous. For a finite alphabet Σ, the following three problems are solved. (1) Number of distinct subsequences: Given a sequence s ∈ Σ and a nonnegative integer k ≤ n, how many distinct subsequences of length k does s contain? A previous result by Chase st...

متن کامل

The Longest Filled Common Subsequence Problem

Inspired by a recent approach for genome reconstruction from incomplete data, we consider a variant of the longest common subsequence problem for the comparison of two sequences, one of which is incomplete, i.e. it has some missing elements. The new combinatorial problem, called Longest Filled Common Subsequence, given two sequences A and B, and a multisetM of symbols missing in B, asks for a s...

متن کامل

Monotone Subsequences in High-Dimensional Permutations

This paper is part of the ongoing effort to study high-dimensional permutations. We prove the analogue to the Erdős–Szekeres Theorem: For every k ≥ 1, every order-n k-dimensional permutation contains a monotone subsequence of length Ωk (√ n ) , and this is tight. On the other hand, and unlike the classical case, the longest monotone subsequence in a random kdimensional permutation of order n is...

متن کامل

On a Speculated Relation Between Chvátal-Sankoff Constants of Several Sequences

It is well known that, when normalized by n, the expected length of a longest common subsequence of d sequences of length n over an alphabet of size σ converges to a constant γσ,d. We disprove a speculation by Steele regarding a possible relation between γ2,d and γ2,2. In order to do that we also obtain some new lower bounds for γσ,d, when both σ and d are small integers.

متن کامل

Recent Progress in Algebraic Combinatorics

We survey three recent breakthroughs in algebraic combinatorics. The first is the proof by Knutson and Tao, and later Derksen and Weyman, of the saturation conjecture for Littlewood-Richardson coefficients. The second is the proof of the n! and (n + 1)n−1 conjectures by Haiman. The final breakthrough is the determination by Baik, Deift, and Johansson of the limiting behavior of the length of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 409  شماره 

صفحات  -

تاریخ انتشار 2008