Generalized Unique Reconstruction from Substrings

نویسندگان

چکیده

This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, strands are sequenced reading some subset their substrings. While previous works considered two extreme cases all substrings pre-defined lengths read or with no overlap for the single string case, this work studies extensions paradigm. The first extension considers setup consecutive given minimum overlap. First, an upper bound provided on attainable rates that guarantee unique reconstruction. Then, efficient constructions asymptotically meet presented. second extension, we study where multiple strings reconstructed together. Given number length, derive lower substrings' length $\ell$ necessary existence multi-strand non-vanishing rates. We then present show approach 1 values behave like bound.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimum Unique Substrings and Maximum Repeats

Unique substrings appear scattered in the stringology literature and have important applications in bioinformatics. In this paper we initiate a study of minimum unique substrings in a given string; that is, substrings that occur exactly once while all their substrings are repeats. We discover a strong duality between minimum unique substrings and maximum repeats which, in particular, allows fas...

متن کامل

Tight Bounds on the Maximum Number of Shortest Unique Substrings

A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs...

متن کامل

Reconstructing Strings from Substrings

We consider an interactive approach to DNA sequencing by hybridization, where we are permitted to ask questions of the form "is s a substring of the unknown sequence S?", where s is a specific query string. We are not told where s occurs in S, nor how many times it occurs, just whether or not s a substring of S. Our goal is to determine the exact contents of S using as few queries as possible. ...

متن کامل

Tight bound on the maximum number of shortest unique substrings

A substring Q of a string S is called a shortest unique substring (SUS) for position p in S, if Q occurs exactly once in S, this occurrence of Q contains position p, and every substring of S which contains position p and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query position p all the SUSs for position p can ...

متن کامل

Finding Characteristic Substrings from Compressed Texts

Text mining from large scaled data is of great importance in computer science. In this paper, we consider fundamental problems on text mining from compressed strings, i.e., computing a longest repeating substring, longest non-overlapping repeating substring, most frequent substring, and most frequent non-overlapping substring from a given compressed string. Also, we tackle the following novel p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Information Theory

سال: 2023

ISSN: ['0018-9448', '1557-9654']

DOI: https://doi.org/10.1109/tit.2023.3269124