speech synthesis method

Classification of Public Transport Information Dialogues Using an Information-Based Coding Scheme

1996

Robert J. van Vark J. P. M. de Vreught Léon J. M. Rothkrantz

The goal of a recently started research project of OVR is to develop a system to automate part of its dialogues now held by human operators at its call centres. To achieve a well thought-out design of such a system both an analysis of the current dialogues and an experiment determining which form of dialogues would be favoured by humans were started. In this paper we will emphasize the analysis...

متن کامل

Speech-rate-variable Hmm-based Japanese Tts System

2002

Koji Iwano Masahiro Yamada Taro Togawa Sadaoki Furui

This paper proposes a new method for controlling phoneme duration according to arbitrary target speech rate in speech synthesis (TTS, text-to-speech) systems. The proposed method first constructs three fundamental duration models at “fast”, “normal”, and “slow” speech rates using Hayashi’s Quantification Theory (Type 1) based on real speech databases and creates a duration model according to a ...

متن کامل

Voice Chat with a Virtual Character: The Good Soldier Svejk Case Project

2002

Jan Nouza Petr Kolár Josef Chaloupka

In this paper we present our initial attempt to link speech processing technology, namely continuous speech recognition, text-to-speech synthesis and artificial talking head, with text processing techniques in order to design a Czech demonstration system that allows for informal voice chatting with virtual characters. Legendary novel figure Svejk is the first personality who can be interviewed ...

متن کامل

Using speech recognition to evaluate skills in spoken English

2001

Rebecca Hincks

This paper analyzes some of the results of the use of PhonePass, a telephone-based test of spoken English that uses automatic speech recognition. It finds that the test provides sensitive measures of speech rate and phonetic accuracy.

متن کامل

Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP

2012

Yamato Ohtani Masatsune Tamura Masahiro Morita Takehiko Kagoshima Masami Akamine

This paper describes a statistical spectral parameter emphasis technique for HMM-based speech synthesis using mel-scaled line spectral pair (mel-LSP). Spectral parameter emphasis is effective for compensating over-smoothed spectra in HMM-based speech synthesis. However, there is no conventional technique that satisfies such requirements as automatic tuning for different speakers and realtime sy...

متن کامل

Foreign-language Speech Synthesis

1998

Nick Campbell

This paper describes a method of concatenative speech synthesis for producing speech in a language other than that of the database speaker. In certain applications, such as interpreted dialogues or multi-lingual e-mail, it is necessary to synthesise words that are foreign with respect to the language of the main text. In this case, rather than switch voices, we show that the use of an intermedi...

متن کامل

Deep Denoising Auto-encoder for Statistical Speech Synthesis

Journal: :CoRR 2015

Zhenzhou Wu Shinji Takaki Junichi Yamagishi

This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis. The technique allows us to automatically extract low-dimensional features from high dimensional spectral features in a non-linear, data-driven, unsupervised way. We compared the new stochastic feature extractor with conventional mel-cepstral analysis in analysis-by-synthesis and...

متن کامل

Title Speech - Rate - Variable HMM - Based Japanese TTS System

2017

Koji Iwano Masahiro Yamada Taro Togawa Sadaoki Furui

This paper proposes a new method for controlling phoneme duration according to arbitrary target speech rate in speech synthesis (TTS, text-to-speech) systems. The proposed method first constructs three fundamental duration models at “fast”, “normal”, and “slow” speech rates using Hayashi’s Quantification Theory (Type 1) based on real speech databases and creates a duration model according to a ...

متن کامل

Formant diphone parameter extraction utilising a labelled single-speaker database

1998

Robert H. Mannell

This paper examines a method for formant parameter extraction from a labeled single speaker database for use in a formantparameter diphone-concatenation speech synthesis system. This procedure commences with an initial formant analysis of the labelled database, which is then used to obtain formant (F1-F5) probability spaces for each phoneme. These probability spaces guide a more careful speaker...

متن کامل

The structural design of the cstr text-to-speech system

1987

Mike McAllister

As part of the Text-to-Speech research at Edinburgh University's Centre for Speech Technology Research (EU_CSTR), a modular, linguistic knowledge based text-to-phoneme system has been implemented in Prolog. Its design considerations, the structure and coverage of its rule bases and typical output from the system are described in the body of this paper.

متن کامل