Template-driven generation of prosodic information for Chinese concatenative synthesis
نویسندگان
چکیده
In this paper, a template-driven generation of prosodic information is proposed for Chinese text-to-speech conversion. A set of monosyllable-based synthesis units is selected from a large continuous speech database. The speech database is employed to establish a word-prosody-based template tree according to the linguistic features: tone combination, word length, part-of-speech (POS) of the word, and word position in a sentence. This template tree stores the prosodic features including pitch contour, average energy, and syllable duration of a word for possible combinations of linguistic features. Two modules for sentence intonation and template selection are proposed to generate the target prosodic templates. The experimental results for the TTS conversion system showed that synthesized prosodic features quite resembled their original counterparts for most syllables in the inside test. Evaluation by subjective experiments also confirmed the satisfactory performance of these approaches.
منابع مشابه
طراحی و ارزیابی یک مدل بازسازی گفتار به روش همگذاری واحدهای حساس به بافت نوایی
This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Persian text-to-speech (TTS) synthesis system. Thesyllables used are prosodically conditioned in the sense that a single conventional syllable is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The three levels of the Per...
متن کاملProsody generation in Chinese synthesis using the template of quantified prosodic unit and base intonation contour
This paper presents a prosody generation method for Chinese mandarin using the template of quantified prosodic unit and base intonation contour. This method uses the prosodic feature picked-up from the syllables in the prosody words by rule as the base unit, and integrates the prosody rules in the prosody words of Chinese mandarin and base intonation contour to achieve the prosody contours with...
متن کاملChoose the best to modify the least: a new generation concatenative synthesis system
The paper describes a corpus-based approach applied in the evolution of ELOQUENS, the CSELT text-to-speech synthesis system for Italian, towards multi-voice, multilanguage, high-naturalness concatenative synthesis. The acoustic modules have been redesigned, according to the idea of reducing the number of junctions and the need of prosodic modification. Appropriate phonetic coverage methods were...
متن کاملTree Mapping Template for Prosodic Phrase Bound-ary Predication
This paper presents a novel method driven by tree mapping template (TMT) which improve the accuracy of prosodic phrase boundary prediction. The TMT is capable of capturing the isomorphic relation between non-terminal nodes in hierarchical prosodic tree and nodes in binary tree approximation, performing pruning at the decoding phase and revising the baseline maximum entropy model with boosting m...
متن کاملUtilization of an HMM-based feature generation module in 5 ms segment concatenative speech synthesis
, – Spectrum at each segment boundary for calculation of concatenation cost (2) Synthesis stage – Text-to-Feature •Generate features from input text (linguistic/prosodic-information) – Feature-to-Speech • Find the N-best candidates in each frame (preselection) according to segment's target cost • Find the best path from the N-best candidates based on concatenation cost •Concatenate the segments...
متن کامل