Grammatical Bigrams

نویسنده

  • Mark A. Paskin
چکیده

Unsupervised learning algorithms have been derived for several statistical models of English grammar, but their computational complexity makes applying them to large data sets intractable. This paper presents a probabilistic model of English grammar that is much simpler than conventional models, but which admits an efficient EM training algorithm. The model is based upon grammatical bigrams, i.e., syntactic relationships between pairs of words. We present the results of experiments that quantify the representational adequacy of the grammatical bigram model, its ability to generalize from labelled data, and its ability to induce syntactic structure from large amounts of raw text.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synthetic Grammar Learning: Implicit Rule Abstraction or Explicit Fragmentary Knowledge?

3 experiments were designed to demonstrate that classifying new letter strings as grammatical (i.e., conforming to a set of rules called a synthetic grammar) or ungrammatical may proceed from fragmentary conscious knowledge of the bigrams constituting the grammatical strings displayed in the study phase, rather than from an unconscious structured representation of the grammar, as Reber (1989) c...

متن کامل

Author identification in short texts

Most research on author identification considers large texts. Not many research is done on author identification for short texts, while short texts are commonly used since the rise of digital media. The anonymous nature of internet applications offers possibilities to use the internet for illegitimate purposes. In these cases, it can be very useful to be able to predict who the author of a mess...

متن کامل

Cubic-time Parsing and Learning Algorithms for Grammatical Bigram Models

This technical report presents a probabilistic model of English grammar that is based upon “grammatical bigrams”, i.e., syntactic relationships between pairs of words. Because of its simplicity, the grammatical bigram model admits cubic-time parsing and unsupervised learning algorithms, which are described in detail.

متن کامل

Discarding impossible events from statistical language models

This paper describes a method for detecting impossible bigrams from a space of V 2 bigrams where V is the size of the vocabulary. The idea is to discard all the ungrammatical events which are impossible in a well written text and consequently to expect an improvement of the language model. We expect also, in speech recognition, to reduce the complexity of the search algorithm by making less com...

متن کامل

A Voice Dictation System for a Million-Word Czech Vocabulary

The paper describes a set of techniques developed for discrete dictation within a vocabulary that contains up to a million entries, which is one of the main challenges in highly inflected languages like Czech. We present our approach to building an efficiently coded tree lexicon with suffix sub-trees and morphologic classification. Acoustic modeling is based on either monophone, diphone, or tri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001