Asymptotic Behavior of the Height in a Digital Search Tree and the Longest Phrase of the Lempel-Ziv Scheme

نویسندگان

  • Charles Knessl
  • Wojciech Szpankowski
چکیده

Wojciech Szpankowskit Department of Computer Science Purdue University W. Lafayette, IN 47907 U.S.A. spa~cs.purdue.edu We study the height of a digital search tree (DST in short) built from n random strings generated by an unbiased memoryless source (i.e., all symbols are equally likely). We shall argue that the height of such a tree is equivalent to the length of the longest phrase in the Lempel-Ziv parsing scheme that partitions a random sequence into n phrases. We also analyze the longest phrase in the Lempel-Ziv scheme in which a string of fixed length m is parsed into a random number of phrases. In the course of our analysis, we shall identify four natural regions of the height distribution and characterize them asymptotically for large n. In particular, for the region where most of the probability mass is concentrated, the asymptotic distribution of the height exhibits an exponen~ tial of a Gaussian distribution (with an oscillating term) around the most probable value k1 = llog2 n + J2log2n log2(J210g2n) + lo~2 ~J + 1. More precisely, we shall prove that the asymptotic distribution of a digital search tree is either concentrated on the one point k1 or the two points k1 -1 and kb which actually proves (slightly modified) Kesten's conjecture quoted in [2J. FinallYl we compare our findings for DST with the asymptotic distributions of the height (recently obtained by us) for other digital trees such as tries and PATRlCIA tries. We derive these results by a combination of analytic methods such as generating functions, Laplace transform, the saddle point method and ideas of applied mathematics such as linearization, asymptotic matching and the WKB method. We also present detailed numerical verification of our results. Key Worns: Digital search trees, Lempel-Ziv algorithm, height distribution, longest phrase distribution, Laplace transform, saddle point method, matched asymptotics, linearization, WKB method, elliptic theta function. -This work was supported by DOE Grant DE-FG02-!J6ER2516B. 'The work of this author was supported by NSF Grants NCR-9415491 and CCR-!J804760.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Generalized Digital Search Trees with Applicationsto a Generalized Lempel - Ziv

The goal of this research is twofold: (i) to analyze generalized digital search trees, and (ii) to derive the average proole (i.e., phrase length) of a generalization of the well known parsing algorithm due to Lempel and Ziv. In the generalized Lempel-Ziv parsing scheme, one partitions a sequence of symbols from a nite alphabet into phrases such that the new phrase is the longest substring seen...

متن کامل

Traveling Front Solutions to Directed Diffusion Limited Aggregation Digital Search Trees and the Lempel-Ziv Data Compression Algorithm

We use the traveling front approach to derive exact asymptotic results for the statistics of the number of particles in a class of directed diffusion-limited aggregation models on a Cayley tree. We point out that some aspects of these models are closely connected to two different problems in computer science, namely, the digital search tree problem in data structures and the Lempel-Ziv algorith...

متن کامل

The expected profile of digital search trees

A digital search tree (DST) is a fundamental data structure on words that finds various applications from the popular Lempel-Ziv’78 data compression scheme to distributed hash tables. The profile of a DST measures the number of nodes at the same distance from the root; it depends on the number of stored strings and the distance from the root. Most parameters of DST (e.g., depth, height, fillup)...

متن کامل

The Expected Profile of Digital Search Trees ∗ March 24 , 2011

A digital search tree (DST) is a fundamental data structure on words that finds various applications from the popular Lempel-Ziv’78 data compression scheme to distributed hash tables. The profile of a DST measures the number of nodes at the same distance from the root; it depends on the number of stored strings and the distance from the root. Most parameters of DST (e.g., depth, height, fillup)...

متن کامل

The Expected Profile of Digital Search Trees ∗ December 10 , 2009

A digital search tree (DST) is a fundamental data structure on words that finds myriad of applications from the popular Lempel-Ziv’78 data compression scheme to distributed hash tables. It is a digital tree in which strings (keys, words) are stored directly in (internal) nodes. The profile of a DST measures the number of nodes at the same distance from the root; it is a function of the number o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM J. Comput.

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2000