Integrating a context-dependent phrase grammar in the variable n-gram framework

نویسندگان

  • Man-Hung Siu
  • Mari Ostendorf
چکیده

This paper focuses on the learning of multi-word lexical units, or phrases, and how to model them within the variable n-gram framework. We introduce the notion of contextdependent phrases and suggest an algorithm for unsupervised learning of phrases. Also, we propose an approach to integrate a phrase grammar and a variable n-gram without the need of explicitly handling multi-word lexical items. The combined variable n-gram phrase grammar improves recognition accuracy on the Switchboard corpus over both the baseline trigram and using a variable n-gram alone.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integration of supra-lexical linguistic models with speech recognition using shallow parsing and finite state transducers

This paper proposes a layered Finite State Transducer (FST) framework integrating hierarchical supra-lexical linguistic knowledge into speech recognition based on shallow parsing. The shallow parsing grammar is derived directly from the full fledged grammar for natural language understanding, and augmented with top-level n-gram probabilities and phrase-level context-dependent probabilities, whi...

متن کامل

Automatic Acquisition of Language Model based on Head-Dependent Relation between Words

Language modeling is to associate a sequence of words with a priori probability, which is a key part of many natural language applications such as speech recognition and statistical machine translation. In this paper, we present a language modeling based on a kind of simple dependency grammar. The grammar consists of head-dependent relations between words and can be learned automatically from a...

متن کامل

Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions

This paper presents an automatic phrase boundary labeling method for speech synthesis database annotation using contextdependent hidden Markov models (CD-HMMs) and n-gram prior distributions. At training stage, CD-HMMs are built to describe the conditional distribution of acoustic features given phonetic label and phrase boundary. In addition, n-gram models are estimated to represent the prior ...

متن کامل

Combination of CFG and N-gram Grammar Lea

SGStudio is a grammar authoring tool that eases semantic grammar development. It is capable of integrating different information sources and learning from annotated examples to induct CFG rules. In this paper, we investigate a modification to its underlying model by replacing CFG rules with n-gram statistical models. The new model is a composite of HMM and CFG. The advantages of the new model i...

متن کامل

Robust N-gram Based Syntactic Analysis Using Segmentation Words

We describe an N-gram based syntactic analysis using a dependency grammar. Instead of generalizing syntactic rules, N-gram information of parts of speech is used to segment a sequence of words into two clauses. A special part of speech, called segmentation word, which corresponds to the beginning or end symbol of clauses is introduced to express a sentence structure. Segmentation words for each...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000