Unsupervised prominence prediction for speech synthesis

نویسندگان

  • Mahnoosh Mehrabani
  • Taniya Mishra
  • Alistair Conkie
چکیده

We propose an unsupervised prominence prediction method for expressive speech synthesis. Prominence patterns are learned by statistical analysis of prosodic features extracted from speech data. The advantages of our unsupervised datadriven prominence prediction include: easy adaptation to new speakers, speech styles, and even languages without requiring expert knowledge or complicated linguistic rules. In this approach, first, prominence predictive prosodic features are extracted at the foot level. Next, the extracted prosodic features are clustered, each cluster representing a prominence level. Based on just-noticeable-differences of prosodic features, the optimal number of perceptually distinct prominence levels is determined. Finally, the proposed prominence prediction is applied to prosody prediction for unit selection speech synthesis. Perceptual evaluation results show a preference for a 4-level unsupervised prominence prediction over a rule-based baseline in terms of naturalness and expressiveness of synthesized speech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prominence-Based Prosody Prediction for Unit Selection Speech Synthesis

This paper describes the development and evaluation of a prosody prediction module for unit selection speech synthesis that is based on the notion of perceptual prominence. We outline the design principles of the module and describe its implementation in the Bonn Open Synthesis System (BOSS). Moreover, we report results of perception experiments that have been conducted in order to evaluate pro...

متن کامل

Identifying prosodic prominence patterns for English text-to-speech synthesis

This thesis proposes to improve and enrich the expressiveness of English Textto-Speech (TTS) synthesis by identifying and generating natural patterns of prosodic prominence. In most state-of-the-art TTS systems the prediction from text of prosodic prominence relations between words in an utterance relies on features that very loosely account for the combined effects of syntax, semantics, word i...

متن کامل

Predicting gradient F0 variation: pitch range and accent prominence

Many aspects of prosody prediction in speech synthesis could be improved, from placement of symbolic accent and phrase boundary markers to control of continuously varying parameters (e.g., duration, fundamental frequency). The goal of this work is to develop algorithms for predicting aspects of fundamental frequency typically said to have gradient variation: pitch range and prominence. In addit...

متن کامل

Prediction of word prominence

Control of prosody is essential for the synthesis of natural sounding speech. Text-to-speech systems tend to accent too many words when taking into account only the distinction between open-class and closed-class words. In the prominence-based approach [1], the degree of accentuation of a syllable is described in terms of a gradual prominence parameter. This paper presents the calculation of th...

متن کامل

Improving prosodic phrase prediction by unsupervised adaptation and syntactic features extraction

In the state-of-the-art speech synthesis system, prosodic phrase prediction is the most serious problem which leads to about 40% of text analysis errors. Two optimization strategies are proposed in this paper to deal with two major types of prosodic phrase prediction errors. First, unsupervised adaptation method is proposed to alleviate the mismatching problem between training and testing. Seco...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013