Automatic generation of prosodic structure for high quality Mandarin speech synthesis

نویسندگان

  • Fu-Chiang Chou
  • Chiu-yu Tseng
  • Lin-Shan Lee
چکیده

A key problem for today's speech synthesis technology is to automatically generate an appropriate hierarchical prosodic structure for text input and incorporate it into synthesized speech[1][2]. This paper presents a method for such a problem in Mandarin Chinese. This method uses a speech database for the training of a statistical model to generate the prosodic structure and determine prosodic parameters such as syllable duration, pause, energy and intonation. The experimental results show that an accuracy of 83.1% in the prediction of prosodic structure can be achieved. Furthermore, a Chinese text-to-speech system can be developed based on the proposed prosodic structure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese

This paper presents a set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese. A large speech corpus produced by a single speaker is used, and the speech output is synthesized from waveform units of variable lengths, with desired linguistic properties, retrieved from this corpus. Detailed methodologies were developed for designing “phonetically rich” and “prosodically ric...

متن کامل

High-Quality Prosody Generation in Mandarin Text-to-Speech System

A text-to-speech (TTS) synthesizer is a computer-based system that can automatically read text aloud. Fujitsu is developing a Mandarin TTS system using state-of-the-art technologies. The prosodic structure of synthesized text provides important information for making synthetic speech produced by a TTS system more natural and understandable. This paper describes a global probability estimation m...

متن کامل

Improved generation of prosodic features in HMM-based Mandarin speech synthesis

The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However, the prosodic features, like F0 and duration trajectories, generated by HMM-based speech synthesis are often excessively smoothed and lack prosodic variance. In HMM-based TTS durations are typically modeled statistically using state duration probabili...

متن کامل

Automatic Prosody Generation in a Text-to-speech System for Hebrew

The paper presents the module for automatic prosody generation within a system for automatic synthesis of high-quality speech based on arbitrary text in Hebrew. The high quality of synthesis is due to the high accuracy of automatic prosody generation, enabling the introduction of elements of natural sentence prosody of Hebrew. Automatic morphological annotation of text is based on the applicati...

متن کامل

Automatic prosodic break labeling for Mandarin Chinese speech data

For corpus-based speech synthesis, large quantities of labeled speech are required. Manually labeling speech data is quite labor-intensive. Therefore, automatic speech labeling is highly desired. Prosodic break detection is one of the tasks for automatic speech labeling. In the paper, we propose an automatic break detection algorithm for mandarin Chinese speech. In this approach, we use energy ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996