Segment connection networks for corpus-based speech synthesis
نویسنده
چکیده
In this paper an efficient segment selection method for corpusbased speech synthesis is presented. Traditional unit-selectors use dynamic programming (DP) to find in a fully connected segment network the most appropriate segment sequence based on targetand concatenation costs. Instead of performing a full DP search, the presented unit-selector applies fast transformations on binary valued segment connection matrices. It is further also shown that this technique can be expanded to supra-segmental unit-selection without altering the segment size in the database.
منابع مشابه
Simple designing methods of corpus-based visual speech synthesis
This paper describes simple designing methods of corpus-based visual speech synthesis. Our approach needs only a synchronous real image and speech database. Visual speech is synthesized by concatenating real image segments and speech segments selected from the database. In order to automatically perform all processes, e.g. feature extraction, segment selection and segment concatenation, we simp...
متن کاملHigh-Quality and Flexible Speech Synthesis with Segment Selection and Voice Conversion
Text-to-Speech (TTS) is a useful technology that converts any text into a speech signal. It can be utilized for various purposes, e.g. car navigation, announcements in railway stations, response services in telecommunications, and e-mail reading. Corpus-based TTS makes it possible to dramatically improve the naturalness of synthetic speech compared with the early TTS. However, no general-purpos...
متن کاملAn Investigation of DNN-Based Speech Synthesis Using Speaker Codes
Recent studies have shown that DNN-based speech synthesis can produce more natural synthesized speech than the conventional HMM-based speech synthesis. However, an open problem remains as to whether the synthesized speech quality can be improved by utilizing a multi-speaker speech corpus. To address this problem, this paper proposes DNN-based speech synthesis using speaker codes as a simple met...
متن کاملConstructing a segment database for greek time domain speech synthesis
In this article, a methodology is presented regarding the design of a segment database for use with a time-domain speech synthesis system for the Greek language. The main issue of this process is the systematic generation of a corpus containing all possible instances of the segments for the specific language. Particular issues such as the phonetic coverage, the sentence selection as well as ite...
متن کاملConditional random fields for hierarchical segment selection in text-to-speech synthesis
In this paper we present the statistically motivated conditional random fields (CRF) approach to concatenative TTS. We use contextual CRFs for speech segment selection where we concatenate the selected segments to an acoustic speech waveform. The CRF approach is used in our corpus-based TTS system AVISS. The acoustic synthesis module consists of trained context dependent CRF models on a multi-l...
متن کامل