Prosodic features for a maximum entropy language model

نویسندگان

  • Oscar Chan
  • Roberto Togneri
چکیده

A statistical language model attempts to characterise the patterns present in a natural language as a probability distribution defined over word sequences. Typically, they are trained using word co-occurrence statistics from a large sample of text. In some language modelling applications, such as automatic speech recognition (ASR), the availability of acoustic data provides an additional source of knowledge. This contains, amongst other things, the melodic and rhythmic aspects of speech referred to as prosody. Although prosody has been found to be an important factor in human speech recognition, its use in ASR has been limited. The goal of this research is to investigate how prosodic information can be employed to improve the language modelling component of a continuous speech recognition system. Because prosodic features are largely suprasegmental, operating over units larger than the phonetic segment, the language model is an appropriate place to incorporate such information. The prosodic features and standard language model features are combined under the maximum entropy framework, which provides an elegant solution to modelling information obtained from multiple, differing knowledge sources. We derive features for the model based on perceptually transcribed Tones and Break Indices (ToBI) labels, and analyse their contribution to the word recognition task. While ToBI has a solid foundation in linguistic theory, the need for human transcribers conflicts with the statistical model’s requirement for a large quantity of training data. We therefore also examine the applicability of features which can be automatically extracted from the speech signal. We develop representations of an utterance’s prosodic context using fundamental frequency, energy and duration features, which can be directly incorporated into the model without the need for manual labelling. Dimensionality reduction techniques are also explored with the aim of reducing the computational costs associated with training a maximum entropy model. Experiments on a prosodically transcribed corpus show that small but statistically significant reductions to perplexity and word error rates can be obtained by using both manually transcribed and automatically extracted features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Acoustic and Syntactic Features for Prosody Labeling in a Maximum Entropy Framework

In this paper we describe an automatic prosody labeling framework that exploits both language and speech information. We model the syntactic-prosodic information with a maximum entropy model that achieves an accuracy of 85.2% and 91.5% for pitch accent and boundary tone labeling on the Boston University Radio News corpus. We model the acousticprosodic stream with two different models, one a max...

متن کامل

Tree Mapping Template for Prosodic Phrase Bound-ary Predication

This paper presents a novel method driven by tree mapping template (TMT) which improve the accuracy of prosodic phrase boundary prediction. The TMT is capable of capturing the isomorphic relation between non-terminal nodes in hierarchical prosodic tree and nodes in binary tree approximation, performing pruning at the decoding phase and revising the baseline maximum entropy model with boosting m...

متن کامل

Prosodic Word Prediction Using a Maximum Entropy Approach

As the basic prosodic unit, the prosodic word influences the naturalness and the intelligibility greatly. Although the research shows that the lexicon word are greatly different from the prosodic word, the lexicon word still provides the important cues for the prosodic word forming. The rhythm constraint is another important factor for the prosodic word prediction. Some lexicon word length patt...

متن کامل

Maximum Entropy Model for Punctuat

In this paper we develop a maximum-entropy based method for annotating spontaneous conversational speech with punctuation. The goal of this task is to make automatic transcriptions more readable by humans, and to render them into a form that is useful for subsequent natural language processing and discourse analysis. Our basic approach is to view the insertion of punctuation as a form of taggin...

متن کامل

Combining lexical, syntactic and prosodic cues for improved online dialog act tagging

Prosody is an important cue for identifying dialog acts. In this paper, we show that modeling the sequence of acoustic– prosodic values as n-gram features with a maximum entropy model for dialog act (DA) tagging can perform better than conventional approaches that use coarse representation of the prosodic contour through summative statistics of the prosodic contour. The proposed scheme for expl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006