Efficient join cost computation for unit selection based TTS systems

نویسندگان

  • Feng Ding
  • Jani Nurminen
  • Jilei Tian
چکیده

A new efficient join cost calculation technique for unit selection based synthesis is proposed. The acoustic features representing the spectral content at the unit boundaries are encoded using multi-stage vector quantization. After applying pseudo-gray coding, the join costs are directly approximated based on the stage-wise codebook indices. As a result, both the memory requirement and the computation complexity are effectively reduced at the same time, making the technique especially suitable for embedded text-to-speech systems. Experiments are carried out comparing the proposed scheme with the original baseline technique that operates in a lossless manner using the uncompressed acoustic data and similarity measurement. Based on the experimental findings, the use of the proposed technique seems to perfectly maintain the speech quality despite the considerable reduction in complexity and memory usage.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rapid unit selection from a large speech corpus for concatenative speech synthesis

Concatenative Text-to-Speech (TTS) systems such as those described by Hunt and Black [6] can select at synthesis time from a very large number of recorded units. The selected units are chosen to minimize a combination of target and join costs for a given sentence. However, the join costs, in particular, can be quite expensive to compute, even when this computation has been optimized. If possibl...

متن کامل

MARY TTS unit selection and HMM-based voices

This paper describes the implementation of a unit selection English voice and a HMM-based Hindi voice for our participation in the Blizzard Challenge 2013. The two voices have been created using the MARY TTS voice building framework. We describe how audiobook data is used to create the English voice and how a quality controlmeasure (statisticalmodel cost) is used to control the selection of uni...

متن کامل

An embedded and concatenative approach to TTS of multiple languages

In thi and appro for E (Es.), efficie archit embe can b comm select text p the la are u letterspeec etc., a This paper presents an embedded and concatenative approach to multilingual text-to-speech system (ECMTTS). Under a uniform architecture, the TTS modules are separated into language dependent and independent ones. A specifically defined super phonetic symbol set enables to use uniform spee...

متن کامل

On the Suitability of Vocalic Sandwiches in a Corpus-Based TTS Engine

Unit selection speech synthesis systems generally rely on target and concatenation costs for selecting the best unit sequence. The role of the concatenation cost is to insure that joining two voice segments will not cause any acoustic artefact to appear. For this task, acoustic distances (MFCC, F0) are typically used but in many cases, this is not enough to prevent concatenation artefacts. Amon...

متن کامل

Efficient and Scalable Met Generation in Corpus-b

This paper proposes performance indices and search criteria for the text script generation in the design of corpus-based TTS systems. Based on the criteria a new search method is presented to solve the text selection problem more systematically and efficiently. Experiment results have shown that with the same hit rate of unit types the new method can reduce up to 40% of text script size in some...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008