Reducing the corpus-based TTS signal degradation due to speaker's word pronunciations

نویسندگان

Sérgio Paulo

Luís C. Oliveira

چکیده

The goal of producing a corpus-based synthesizer with the owner’s voice can only be achieved if the system can handle recordings with less than ideal characteristics. One of the limitations is that a normal speaker does not always pronounce a word exactly as predicted by the language rules. In this work we compare two methods for handling variations on word pronunciation for corpus-based speech synthesizers. Both approaches rely on a speech corpus aligned with a phone-level segmentation tool that allows alternative word pronunciations. The first approach performs an alignment between the observed pronunciation and the canonical form used in the system’s lexicon, allowing the mapping of the time labels from the observed phones into the canonical form. At synthesis time the unit selection is performed on the phone sequence predicted by the system. In the second approach, no modification is performed on the phone sequence generated by the segmentation tool. This way, at synthesis time, the words are converted into phones by using the speaker’s word pronunciation, rather than the system’s lexicon. Finally, both approaches are compared by evaluating the naturalness of the signals generated by each approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reducing the Corpus-based TTS Signal Deg Pronunciatio

متن کامل

Pronunciation lexicon adaptation for TTS voice building

This paper describes reducing phone label errors in TTS voice building by means of modeling of speaker pronunciation variants. Each speaker has his or her own unique pronunciations (and context-dependent variations), so that no one standard lexicon is able to cover all of the speaker’s variations. Creating speaker-dependent pronunciation lexicons for automatic speech labeling of our TTS voice d...

متن کامل

A comparison of pronunciation modeling approaches for HMM-TTS

Hidden Markov model-based text-to-speech (HMM-TTS) systems are often trained on manual voice corpus phonetic transcriptions, despite the fact that because these manual pronunciations cannot be predicted with complete accuracy at synthesis time, the result is training/synthesis mismatch. In this paper, an alternate approach is proposed in which a set of manually written post-lexical effects (PLE...

متن کامل

Optimal Feature Set and Minimal Training Size for Pronunciation Adaptation in TTS

Text-to-Speech (TTS) systems rely on a grapheme-to-phoneme converter which is built to produce canonical, or statically stylized, pronunciations. Hence, the TTS quality drops when phoneme sequences generated by this converter are inconsistent with those labeled in the speech corpus on which the TTS system is built, or when a given expressivity is desired. To solve this problem, the present work...

متن کامل

Learning Similarity Functions for Pronunciation Variations

A significant source of errors in Automatic Speech Recognition (ASR) systems is due to pronunciation variations which occur in spontaneous and conversational speech. Usually ASR systems use a finite lexicon that provides one or more pronunciations for each word. In this paper, we focus on learning a similarity function between two pronunciations. The pronunciations can be the canonical and the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Reducing the corpus-based TTS signal degradation due to speaker's word pronunciations

نویسندگان

چکیده

منابع مشابه

Reducing the Corpus-based TTS Signal Deg Pronunciatio

Pronunciation lexicon adaptation for TTS voice building

A comparison of pronunciation modeling approaches for HMM-TTS

Optimal Feature Set and Minimal Training Size for Pronunciation Adaptation in TTS

Learning Similarity Functions for Pronunciation Variations

عنوان ژورنال:

اشتراک گذاری