Uniquely decodable n-gram embeddings

نویسنده

  • Aryeh Kontorovich
چکیده

We define the family of n-gram embeddings from strings over a finite alphabet into the semimodule N . We classify all ∈ N that are valid images of strings under such embeddings, as well as all whose inverse image consists of exactly 1 string (we call such uniquely decodable). We prove that for a fixed alphabet, the set of all strings whose image is uniquely decodable is a regular language. © 2004 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the ratio of prefix codes to all uniquely decodable codes with a given length distribution

We investigate the ratio ρn,L of prefix codes to all uniquely decodable codes over an n-letter alphabet and with length distribution L. For any integers n ≥ 2 and m ≥ 1, we construct a lower bound and an upper bound for infL ρn,L, the infimum taken over all sequences L of length m for which the set of uniquely decodable codes with length distribution L is non-empty. As a result, we obtain that ...

متن کامل

Multimodal Word Distributions

Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic i...

متن کامل

Mixed Membership Word Embeddings for Computational Social Science

Word embeddings improve the performance of NLP systems by revealing the hidden structural relationships between words. These models have recently risen in popularity due to the performance of scalable algorithms trained in the big data setting. Despite their success, word embeddings have seen very little use in computational social science NLP tasks, presumably due to their reliance on big data...

متن کامل

On Embeddings of $\ell_1^k$ from Locally Decodable Codes

We show that any q-query locally decodable code (LDC) gives a copy of l 1 with small distortion in the Banach space of q-linear forms on lNp1 ×· · ·× lNpq , provided 1/p1+ · · ·+1/pq ≤ 1 and where k, N , and the distortion are simple functions of the code parameters. We exhibit the copy of l 1 by constructing a basis for it directly from “smooth” LDC decoders. Based on this, we give alternative...

متن کامل

On the set of uniquely decodable codes with a given sequence of code word lengths

For every natural number n ≥ 2 and every finite sequence L of natural numbers, we consider the set UDn(L) of all uniquely decodable codes over an n-letter alphabet with the sequence L as the sequence of code word lengths, as well as its subsets PRn(L) and FDn(L) consisting of, respectively, the prefix codes and the codes with finite delay. We derive the estimation for the quotient |UDn(L)|/|PRn...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 329  شماره 

صفحات  -

تاریخ انتشار 2004