Reconstruction from subwords ∗
نویسندگان
چکیده
In the paper two variants of a combinatorial problem for the set F n q of sequences of length n over the alphabet Fq = {0, 1, .., q − 1} are considered, with some applications. The original problem was the following: for a given word w ∈ F n q , what is the smallest integer k such that we can reconstruct w if we know all of its subwords of length k. This problem was solved by Lothaire [8] . We consider the following variant of this problem: the n-letter word w = w1...wn (which is called a DNA-word) is composed over an alphabet consisting of q complement pairs:{i, ī : i = 0, .., q − 1}; and denote by w∗ its reverse complement, i.e. w∗ = w̄n...w̄1. A DNA-word u is called a subword of w if it is a subword of either w or w∗. (Another formulation is that we identify w and w∗.) We want to reconstruct w from its subwords of length k. We give a simple proof for k = n − 1, and apply this result for determining the automorphism group of the poset of DNA-words of length at most n, partially ordered by the above subword relation.
منابع مشابه
Discovering discrete subword units with binarized autoencoders and hidden-Markov-model encoders
In this paper we address the problem of unsupervised learning of discrete subword units. Our approach is based on Deep Autoencoders (AEs), whose encoding node values are thresholded to subsequently generate a symbolic, i.e., 1-of-K (with K = No. of subwords), representation of each speech frame. We experiment with two variants of the standard AE which we have named Binarized Autoencoder and Hid...
متن کاملCharacterization of a word by its subwords
We consider what is the amount of subwords of a word needed to completely determine the word. More precisely, we study the maximal length such that all words of this length can be uniquely determined by its subwords of a xed length. The set of subwords of a xed length is called a spectrum. Four types of spectrums are analyzed: sparse, factor, sparse with multiplicity and factor with multiplicit...
متن کاملSearch Space Reduction for Farsi Printed Subwords Recognition by Position of the Points and Signs
In the field of the words recognition, three approaches of words isolation, the overall shape and combination of them are used. Most optical recognition methods recognize the word based on break the word into its letters and then recogniz them. This approach is faced some problems because of the letters isolation dificulties and its recognition accurcy in texts with a low image quality. Therefo...
متن کاملScattered subwords and composition of natural numbers
Special scattered subwords in which the length of the gaps are bounded by two natural numbers are considered. For rainbow words the number of such scattered subwords is equal to the number of special restricted compositions of natural numbers in which the components are natural numbers from a given interval. Linear algorithms to compute such numbers are given. We also introduce the concepts of ...
متن کاملCharacterization of a Word by Its Subwords
We consider what is the amount of subwords of a word needed to completely determine the word. More precisely, we study the maximal length such that all words of this length can be uniquely determined by its subwords of a xed length. The set of subwords of a xed length is called a spectrum. Four types of spectrums are analyzed: sparse, factor, sparse with multiplicity and factor with multiplicit...
متن کامل