60 70 16 v 1 1 3 Ju l 1 99 6 Beyond Word N - Grams
نویسندگان
چکیده
We describe, analyze, and evaluate experimentally a new probabilistic model for word-sequence prediction in natural language based on prediction suffix trees (PSTs). By using efficient data structures, we extend the notion of PST to unbounded vocabularies. We also show how to use a Bayesian approach based on recursive priors over all possible PSTs to efficiently maintain tree mixtures. These mixtures have provably and practically better performance than almost any single model. We evaluate the model on several corpora. The low perplexity achieved by relatively small PST mixture models suggests that they may be an advantageous alternative, both theoretically and practically, to the widely used n-gram models.
منابع مشابه
ar X iv : h ep - l at / 9 60 70 64 v 1 2 6 Ju l 1 99 6
We present results from magnetic monopoles in SU (2) lattice gauge theory at finite temperature. The lattices are 16 3 × N t , for N t = 4, 6, 8, 12, at β = 2.5115. Quantities discussed are: the spacial string tension, Polyakov loops, and the screening of timelike and spacelike magnetic currents.
متن کاملar X iv : n uc l - th / 9 90 60 29 v 3 1 5 Ju l 1 99 9 1 Quark Matter ’ 99 — Theoretical Summary : What Next ?
متن کامل
ar X iv : n uc l - th / 9 60 70 14 v 1 1 1 Ju l 1 99 6 MICROSCOPIC CALCULATION OF EXPANDING NUCLEAR MATTER
Quantum Molecular Dynamics (QMD) calculations are used to study the expansion phase in central collisions between heavy nuclei. The final state of such a reaction can be understood as the result of a entropy conserving expansion starting from a compact source. The properties of this hypothetic source, however, are in conflict with the assumptions used in fireball models. Moreover, this hypothet...
متن کامل