Breaking a Time-and-Space Barrier in Constructing Full-Text Indices

نویسندگان

  • Wing-Kai Hon
  • Kunihiko Sadakane
  • Wing-Kin Sung
چکیده

Suffix trees and suffix arrays are the most prominent full-text indices, and their construction algorithms are well studied. In the literature, the fastest algorithm runs in O(n) time, while it requires O(n log n)-bit working space, where n denotes the length of the text. On the other hand, the most space-efficient algorithm requires O(n)-bit working space while it runs in O(n log n) time. It was open whether these indices can be constructed in both o(n log n) time and o(n log n)-bit working space. This paper breaks the above time-and-space barrier under the unit-cost word RAM. We give an algorithm for constructing the suffix array which takes O(n) time and O(n)-bit working space, for texts with constant-size alphabets. Note that both the time and the space bounds are optimal. For constructing the suffix tree, our algorithm requires O(n log n) time and O(n)-bit working space for any 0 < 2 < 1. Apart from that, our algorithm can also be adopted to build other existing full-text indices, such as Compressed Suffix Tree, Compressed Suffix Arrays and FM-index. We also study the general case where the size of the alphabet Σ is not constant. Our algorithm can construct a suffix array and a suffix tree using optimal O(n log |Σ|)-bit working space while running in O(n log log |Σ|) time and O(n(log n + log |Σ|)) time, respectively. These are the first algorithms that achieve o(n log n) time with optimal working space. Moreover, for the special case where log |Σ| = O((log log n)1−2), we can speed up our suffix array construction algorithm to the optimal O(n).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Space as a Semiotic Object: A Three-Dimensional Model of Vertical Structure of Space in Calvino’s Invisible Cities

Following the “spatial turn” of the last 3 decades in humanities and social sciences and the structure of semiotic object, this research studies space as the main semiotic object of Calvino’s (1972) Invisible Cities. Significance of this application resides in examining the possibility of providing a more concrete methodology based on the integration of Zoran’s (1984) 3 vertical levels of const...

متن کامل

On the Relation Between the Large-Scale Tropospheric Circulation and Air Quality in Tehran

The large-scale tropospheric circulation can play a controlling role in the accumulation and ventilation of air pollutants. It thus impacts air quality in large urban areas. This paper investigates the statistical relations between the dynamical indices related to circulation in the troposphere and visibility as a surrogate for air pollution in the urban area of Tehran for the climatological pe...

متن کامل

The Identity of Moses in Surah Al-Qasas with Reference to Time and Space

The question of identity in a narrative text is one of the most influential questions that need further study. The variations in the factors that may affect the concept of identity add to the complexity of the narrative text. The study aims at analyzing the main phases, stages, themes and events of Moses’ life story as part of the narrative discourse. The effects of time and place on the main e...

متن کامل

Characterization of $2times 2$ full diversity space-time codes and inequivalent full rank spaces

‎In wireless communication systems‎, ‎space-time codes are applied to encode data when multiple antennas are used in the receiver and transmitter‎. ‎The concept of diversity is very crucial in designing space-time codes‎. ‎In this paper‎, ‎using the equivalent definition of full diversity space-time codes‎, ‎we first characterize all real and complex $2times 2$ rate one linear dispersion space-...

متن کامل

Smaller and Faster Lempel-Ziv Indices

Given a text T [1..u] over an alphabet of size σ = O(polylog(u)) and with k-th order empirical entropy Hk(T ), we propose a new compressed full-text self-index based on the Lempel-Ziv (LZ) compression algorithm, which replaces T with a representation requiring about three times the size of the compressed text, i.e (3+ ǫ)uHk(T )+ o(u log σ) bits, for any ǫ > 0 and k = o(log σ u), and in addition...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM J. Comput.

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2003