Tree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems

نویسندگان

  • Sangho Lee
  • Yung-Hwan Oh
چکیده

This study describes the tree-based modeling of prosodic phrasing, pause duration between phrases and segmental duration for Korean TTS systems. We collected 400 sentences from various genres and built a corresponding speech corpus uttered by a professional female announcer. The phonemic and prosodic boundaries were manually marked on the recorded speech, and morphological analysis, grapheme-to-phoneme conversion and syntactic analysis were also done on the text. A decision tree and regression trees were trained on 240 sentences (of approximately 20 minutes length), and tested on 160 sentences (of approximately 13 minutes length). Features for modeling prosody are proposed, and their eeectiveness is measured by interpreting the resulting trees. The misclassiication rate of the decision tree was 14.46%, the RMSEs of the regression trees, which predict pause duration and seg-mental duration, were 132 ms and 22 ms respectively for the test set. To understand the performance of our approach in the run time of TTS systems, we trained and tested trees with the output of our text analyzer. The misclassiication rate and the RMSE were 18.49% and 134 ms respectively for prosodic phrasing and pause duration on the test set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Training prosodic phrasing rules for Chinese TTS systems

This paper describes several experiments designed to train prosodic phrasing models for Chinese TTS systems and to investigate the underlying rules that control Chinese prosody. First, we collected 559 sentences from news programs and built a large corpus for modeling Chinese prosody. Second, we selected 20 features and used classification and regression trees (CART) and transformational rule-b...

متن کامل

Phonetic normalization using z-score in segmental prosody estimation for corpus-based TTS system

Recently, corpus-based text-to-speech (CB-TTS) has been actively studied through the world. Statistical training methods are generally applied for prosodic rules in CB-TTS, and classification and regression tree (CART) is one of the mostly used methods. In this paper, we present an efficient CART training approach of zscore based phonetic normalization. The idea of ours comes from the fact that...

متن کامل

A prosodic phrasing model for a Korean text-to-speech synthesis system

This paper presents a prosodic phrasing model for Korean to be used in a textto-speech synthesis (TTS) system. Read text corpora were morpho-syntactically parsed and prosodically labeled following the Penn Korean Treebank [Han et al., 2002] and K-ToBI prosodic labeling conventions [Sun-Ah, 2000] respectively. Decision trees were trained with morpho-syntactic and textual distance features to pre...

متن کامل

Prosodic phrasing modeling for vietnamese TTS using syntactic information

This research aims at modeling prosodic phrasing for improving the naturalness of Vietnamese (a tonal language) speech synthesis. The proposed phrasing model includes hypotheses on: (i) prosodic structure based on syntactic rules (ii) final lengthening linked to syllabic structures and tone types. Audio files in the analysis corpus are manually transcribed at the syllable level and perceived pa...

متن کامل

CART-based duration modeling using a novel method of extracting prosodic features

The prediction of accurate segmental durations remains a difficult problem when synthesising speech from text. Inaccurate durations are often perceptually prominent and detract from the naturalness of the quality of speech. For a concatenative system, a statistical approach is an excellent way of predicting segmental durations. More specifically a CART (Classification And Regression Tree) metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 28  شماره 

صفحات  -

تاریخ انتشار 1999