Approximate N-Gram Markov Model for Natural Language Generation
نویسندگان
چکیده
This paper proposes an Approximate n-gram Markov Model for bag generation. Directed word association pairs with distances are used to approximate (n-1)-gram and n-gram training tables. This model has parameters of word association model, and merits of both word association model and Markov Model. The training knowledge for bag generation can be also applied to lexical selection in machine translation design.
منابع مشابه
Unsupervised Learning on an Approximate Corpus
Unsupervised learning techniques can take advantage of large amounts of unannotated text, but the largest text corpus (the Web) is not easy to use in its full form. Instead, we have statistics about this corpus in the form of n-gram counts (Brants and Franz, 2006). While n-gram counts do not directly provide sentences, a distribution over sentences can be estimated from them in the same way tha...
متن کاملLearning Representations for Weakly Supervised Natural Language Processing Tasks
Finding the right representations for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This article investigates novel techniques for extracting features from n-gram models, Hidden Markov Models, and other statistical language models, including a novel Partial Lattice Markov Random Field model. Experiments on partof-speech tagging and...
متن کاملBayesian Variable Order n-gram Language Model based on Pitman-Yor Processes
This paper proposes a variable order n-gram language model by extending a recently proposed model based on the hierarchical Pitman-Yor processes. Introducing a stochastic process on an infinite depth suffix tree, we can infer the hidden n-gram context from which each word originated. Experiments on standard large corpora showed validity and efficiency of the proposed model. Our architecture is ...
متن کاملEstimating Comma Placement in Natural Language
We study the feasibility of identifying comma locations using both n-gram models and stochastic contextfree grammars (SCFGs). Specifically, our algorithms take an input sentence without commas and returns the positions where commas should be inserted, along with probability or confidence estimates. This can be generalized to correcting comma placement with minor modifications. However, we focus...
متن کاملIntra-sentence Punctuation Insertion in Natural Language Generation
We describe a punctuation insertion model used in the sentence realization module of a natural language generation system for English and German. The model is based on a decision tree classifier that uses linguistically sophisticated features. The classifier outperforms a word n-gram model trained on the same data.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/cmp-lg/9408012 شماره
صفحات -
تاریخ انتشار 1994