Automatic Selection of Synthesis Units from a Large Speech Database
نویسندگان
چکیده
In this paper, a novel method for the selection of synthesis unit is proposed. The monosyllables are adopted as the basic synthesis units. A set of high-quality synthesis units is selected from a large continuous speech database based on four procedures: pitch period detection and smoothing, speech unit filtering, unit selection, and manual examination. Two cost functions are proposed for obtaining the synthesis units, which minimize the interand intra-syllable distortion. The cost functions estimate the parameters including the prosodic features, the LSP frequencies, and types of syllable concatenation. Experimental results showed that a match rate of 48.9% was achieved. It indicates that about half of the "best" synthesis units can be automatically obtained. Also, a replacement rate of 4.8% was obtained.
منابع مشابه
A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملUnit selection in a concatenative speech synthesis system using a large speech database
One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database ca...
متن کاملAutomatically clustering similar units for unit selection in speech synthesis
This paper describes a new method for synthesizing speech by concatenating sub-word units from a database of labelled speech. A large unit inventory is created by automatically clustering units of the same phone class based on their phonetic and prosodic context. The appropriate cluster is then selected for a target unit offering a small set of candidate units. An optimal path is found through ...
متن کاملJoin Cost for Unit Selection Speech Synthesis
In unit-selection speech synthesis systems, synthetic speech is produced by concatenating speech units selected from a large database, or inventory, which contains many instances of each speech unit with varied prosodic and spectral characteristics. Hence, by selecting an appropriate sequence of units, it is possible to synthesize highly natural-sounding speech. The selection of the best unit s...
متن کاملSegment selection in the L&h Realspeak laboratory TTS system
The L&H RealSpeak Laboratory TTS (RSLab) system is a corpus based speech synthesis system comprising components that deal with linguistic processing, prosody prediction, segment selection, concatenation and modification. In this paper we focus on the segment selection process. During segment selection, the units in a large database of speech are scored with a cost according to their prosodic/ph...
متن کامل