A Flexible, Scalable Finite-state Transducer Architecture for Corpus-based Concatenative Speech Synthesis1
نویسندگان
چکیده
In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit selection in concatenative speech synthesis. By encapsulating domainindependent concatenative synthesis costs into a constraint kernel, we have obtained a topology that scales linearly with the size of the synthesis corpus. The FST representation provides a flexible, unified framework in which we can leverage our previous work in speech recognition in areas such as pronunciation modelling and search. The FST synthesizer has been incorporated into two servers which operate within our conversational system architecture to convert meaning representations into waveforms. We have had preliminary success with the new FST-based synthesis in several constrained spoken dialogue applications.
منابع مشابه
A flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis
In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit selection in concatenative speech synthesis. By encapsulating domainindependent concatenative synt...
متن کاملJoint prosody prediction and unit selection for concatenative speech synthesis
In this paper we describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis, thus effectively expanding the space of possible prosodic targets. We implemented a symbolic pr...
متن کاملPreliminary Evaluations of a WFST Speech Decoder
In this paper we present preliminary evaluations on the large vocabulary speech decoder we are currently developing at Tokyo Institute of Technology. Our goal is to build a scalable and flexible decoder to operate on weighted finite state transducer (WFST) search spaces. Even though the development of the decoder is still in its infancy we are already achieving good accuracy and speed on a larg...
متن کاملArchitectures for Speech-to-Speech Translation Using Finite-state Models
Speech-to-speech translation can be approached using finite state models and several ideas borrowed from automatic speech recognition. The models can be Hidden Markov Models for the accoustic part, language models for the source language and finite state transducers for the transfer between the source and target language. A “serial architecture” would use the Hidden Markov and the language mode...
متن کاملUnit selection for speech synthesis using splicing costs with weighted finite state transducers
In this paper we describe how unit selection for concatenative speech synthesis can be implemented efficiently for sub-phonetic units using weighted finite state transducers (WFST). We also introduce splicing costs as a measure to indicate which unit boundaries are particularly good or poor joint points. Splicing costs extend the flexibility offered by the unit selection paradigm. Through a per...
متن کامل