Analysis of the Multiplicity Matching Parameter in Suffix Trees

نویسندگان

  • Mark Daniel Ward
  • Wojciech Szpankowski
چکیده

In a suffix tree, the multiplicity matching parameter (MMP) Mn is the number of leaves in the subtree rooted at the branching point of the (n + 1)st insertion. Equivalently, the MMP is the number of pointers into the database in the Lempel-Ziv ’77 data compression algorithm. We prove that the MMP asymptotically follows the logarithmic series distribution plus some fluctuations. In the proof we compare the distribution of the MMP in suffix trees to its distribution in tries built over independent strings. Our results are derived by both probabilistic and analytic techniques of the analysis of algorithms. In particular, we utilize combinatorics on words, bivariate generating functions, pattern matching, recurrence relations, analytical poissonization and depoissonization, the Mellin transform, and complex analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth

Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...

متن کامل

Suffix Trees and Suffix Arrays

Iowa State University 1.1 Basic Definitions and Properties . . . . . . . . . . . . . . . . . . . . 1-1 1.2 Linear Time Construction Algorithms . . . . . . . . . . . . . 1-4 Suffix Trees vs. Suffix Arrays • Linear Time Construction of Suffix Trees • Linear Time Construction of Suffix Arrays • Space Issues 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

متن کامل

Bidirectional Construction of Suffix Trees

String matching is critical in information retrieval since in many cases information is stored and manipulated as strings. Constructing and utilizing a suitable data structure for a text string, we can solve the string matching problem efficiently. Such a structure is called an index structure. Suffix trees are certainly the most widely-known and extensively-studied structure of this kind. In t...

متن کامل

Skriptum VL Text Indexing

In this section we will introduce suffix trees, which, among many other things, can be used to solve the string matching task (find pattern P of length m in a text T of length n in O(n + m) time). We already know that other methods (Boyer-Moore, e.g.) solve this task in the same time. So why do we need suffix trees? The advantage of suffix trees over the other string-matching algorithms (Boyer-...

متن کامل

An Estimation of the Size of Non-Compact Suffix Trees

A suffix tree is a data structure used mainly for pattern matching. It is known that the space complexity of simple suffix trees is quadratic in the length of the string. By a slight modification of the simple suffix trees one gets the compact suffix trees, which have linear space complexity. The motivation of this paper is the question whether the space complexity of simple suffix trees is qua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005