Compact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth

Authors

  • Bonita McVey
  • Philippe Jacquet
  • Wojciech Szpankowski
Abstract:

Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even though the PATRICIA trie is constructed from statistically independent strings. As a result, we show that the limiting distribution for the depth in a PAT tree built over n suffixes is normal.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

Compact Suffix Trees Resemble Patricia Tries: Limiting Distribution of Depth

Wojciech Szpankowskrl: Dept. of Computer Science Purdue University W. Lafayette, IN 47907 U.S.A. Suffix trees are the most frequently used data structure in algorithms on words. Despite this, little is known about their behavior in a probabilistic framework. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. In...

full text

Average profiles, from tries to suffix-trees

We build upon previous work of Fayolle (2004) and Park and Szpankowski (2005) to study asymptotically the average internal profile of tries and of suffix-trees. The binary keys and the strings are built from a Bernoulli source (p, q). We consider the average number pk,P(ν) of internal nodes at depth k of a trie whose number of input keys follows a Poisson law of parameter ν. The Mellin transfor...

full text

Uncommon Suffix Tries

Common assumptions on the source producing the words inserted in a suffix trie with n leaves lead to a lnn height and saturation level. We provide an example of a suffix trie whose height increases faster than a power of n and another one whose saturation level is negligible with respect to lnn. Both are built from VLMC (Variable Length Markov Chain) probabilistic sources and are easily extende...

full text

Linear-size suffix tries

Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n = |w|, a suffix tree for w takes O(n) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each...

full text

Expected External Profile of PATRICIA Tries

We consider PATRICIA tries on n random binary strings generated by a memoryless source with parameter p ≥ 1 2 . For both the symmetric (p = 1/2) and asymmetric cases, we analyze asymptotics of the expected value of the external profile at level k = k(n), defined to be the number of leaves at level k. We study three natural ranges of k with respect to n. For k bounded, the mean profile decays ex...

full text

An Estimation of the Size of Non-Compact Suffix Trees

A suffix tree is a data structure used mainly for pattern matching. It is known that the space complexity of simple suffix trees is quadratic in the length of the string. By a slight modification of the simple suffix trees one gets the compact suffix trees, which have linear space complexity. The motivation of this paper is the question whether the space complexity of simple suffix trees is qua...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 3  issue None

pages  139- 148

publication date 2004-11

By following a journal you will be notified via email when a new issue of this journal is published.

Keywords

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023