Reconciling Pronunciation Differences between the Front- End and Back-end in the Ibm Speech Synthesis System

نویسندگان

  • Wael Hamza
  • Raimo Bakis
  • Ellen Eide
چکیده

In this paper, methods for reconciling pronunciation differences between a rule-based front-end and the pronunciations observed in a database of recorded speech are presented. The methods are applied to the IBM Expressive Speech Synthesis System [1] for both unrestricted and limited-domain text-to-speech synthesis. One method is based on constructing a multiple pronunciation lattice for the given sentence and scoring it using word and phoneme n-gram statistics computed from the target speaker’s database. A second method consists of storing observed pronunciations and introducing them as alternates in the search. We compare the strengths and weaknesses of these two methods. Results show that improvements are achieved in both limited and unrestricted domains, with the largest gains coming in the limited-domain case.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reconciling pronunciation differences between the front-end and the back-end in the IBM speech synthesis system

In this paper, methods for reconciling pronunciation differences between a rule-based front-end and the pronunciations observed in a database of recorded speech are presented. The methods are applied to the IBM Expressive Speech Synthesis System [1] for both unrestricted and limited-domain text-to-speech synthesis. One method is based on constructing a multiple pronunciation lattice for the giv...

متن کامل

Optimization of Text-To-Speech pho posteriori signal co

One issue arising in text-to-phone conversion is inconsistency between its output and the phonetic time-alignment of the dataset, hindering the back-end’s ability to access the best units to synthesize a text. Some such inconsistency is inevitable because dataset labeling requires allowance for alternate pronunciations of words, while the front-end typically predicts a single pronunciation for ...

متن کامل

Computer Assisted Pronunciation Teaching (CAPT) and Pedagogy: Improving EFL learners’ Pronunciation Using Clear Pronunciation 2 Software

This study examined the impact of Clear Pronunciation 2 software on teaching English suprasegmental features, focusing on stress, rhythm and intonation. In particular, the software covers five topics in relation to suprasegmental features including consonant cluster, word stress, connected speech, sentence stress and intonation. Seven Iranian EFL learners participated in this study. The study l...

متن کامل

A Corpus-Based Concatenative Speech Synthesis System for Turkish

Speech synthesis is the process of converting written text into machine-generated synthetic speech. Concatenative speech synthesis systems form utterances by concatenating pre-recorded speech units. Corpus-based methods use a large inventory to select the units to be concatenated. In this paper, we design and develop an intelligible and natural sounding corpus-based concatenative speech synthes...

متن کامل

Enhancement of noisy speech for noise robust front-end and speech reconstruction at back-end of DSR system

This paper presents a speech enhancement method for noise robust front-end and speech reconstruction at the back-end of Distributed Speech Recognition (DSR). The speech noise removal algorithm is based on a two stage noise filtering LSAHT by log spectral amplitude speech estimator (LSA) and harmonic tunneling (HT) prior to feature extraction. The noise reduced features are transmitted with some...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004