Glottal source and vocal-tract separation Estimation of glottal parameters, voice transformation and synthesis using a glottal model

نویسندگان

  • Gilles Degottex
  • Thierry Dutoit
  • Yannis Stylianou
  • Nathalie Henrich
  • Olivier Rosec
  • Jean-Luc Zarader
  • Olivier Boëffard
چکیده

This study addresses the problem of inverting a voice production model to retrieve, for a given recording, a representation of the sound source which is generated at the glottis level, the glottal source, and a representation of the resonances and anti-resonances of the vocal-tract. This separation gives the possibility to manipulate independently the elements composing the voice. There are many applications of this subject like the ones presented in this study, namely voice transformation and speech synthesis, as well as many others such as identity conversion, expressivity synthesis, voice restoration which can be used in entertainment technologies, artistic sound installations, movies and music industry, toys and video games, telecommunication, etc. In this study, we assume that the perceived elements of the voice can be manipulated using the well known source-filter model. In the spectral domain, voice production is thus described as a multiplication of the spectra of its elements, the glottal source, the vocal-tract filter and the radiation. The second assumption used in this study concerns the deterministic component of the glottal source. Indeed, we assume that a glottal model can fit one period of the glottal source. Using such an analytical description, the amplitude and phase spectra of the deterministic source are linked through the shape parameter of the glottal model. Regarding the state of the art of voice transformation and speech synthesis methods, the naturalness and the control of the transformed and synthesized voices should be improved. Accordingly, we try to answer the three following questions: 1) How to estimate the parameter of a glottal model? 2) How to estimate the vocal-tract filter according to this glottal model? 3) How to transform and synthesize a voiced signal using this glottal model? Special attention is given to the first question. We first assume that the glottal source and the impulse response of the vocal-tract filter are mixed-phase and minimum-phase signals respectively. Then, based on these properties, various methods are proposed which minimize the mean squared phase of the convolutive residual of an observed spectrum and its model. A last method is described where a unique shape parameter is in a quasi closed-form expression of the observed spectrum. Additionally, this study discusses the conditions a glottal model and its parametrization have to satisfy in order to ensure that the parameters estimation is reliable using the proposed methods. These methods are also evaluated and compared to state of the art methods using synthetic and electroglottographic signals. Using one of the proposed methods, the estimation of the shape parameter is independent of the position and the amplitude of the glottal model. Moreover, it is shown that this same method outperforms all the compared methods. To answer the second and third questions addressed in this study, we propose an analysis/synthesis procedure which estimates the vocal-tract filter according to an observed spectrum and its estimated source. Preference tests have been carried out and their results are presented in this study to compare the proposed procedure to existing ones. In terms of pitch transposition, it is shown that the overall quality of the voiced segments of a recording can be improved for important transposition factors. It is also shown that the breathiness of a voice can be controlled.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Steady Flow Through Modeled Glottal Constriction

The airflow in the modeled glottal constriction was simulated by the solutions of the Navier-Stokes equations for laminar flow, and the corresponding Reynolds equations for turbulent flow in generalized, nonorthogonal coordinates using a numerical method. A two-dimensional model of laryngeal flow is considered and aerodynamic properties are calculated for both laminar and turbulent steady flows...

متن کامل

Shape parameter estimate for a glottal model without time position

From a recorded speech signal, we propose to estimate a shape parameter of a glottal model without estimating his time position. Indeed, the literature usually propose to estimate the time position first (ex. by detecting Glottal Closure Instants). The vocal-tract filter estimate is expressed as a minimum-phase envelope estimation after removing the glottal model and a standard lips radiation m...

متن کامل

Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis

In current methods for voice transformation and speech synthesis, the vocal tract filter is usually assumed to be excited by a flat amplitude spectrum. In this article, we present a method using a mixed source model defined as a mixture of the Liljencrants–Fant (LF) model and Gaussian noise. Using the LF model, the base approach used in this presented work is therefore close to a vocoder using ...

متن کامل

Glottal Closure Instant detection from a glottal shape estimate

The GCI detection is a common problem in voice analysis used for voice transformation and synthesis. The proposed innovative idea is to use a glottal shape estimate and a standard lips radiation model instead of the common pre-emphasis when computing the vocal-tract filter estimate. The time-derivative glottal source is then computed from the division in frequency of the speech spectrum by the ...

متن کامل

A Review of Glottal Waveform Analysis

Glottal inverse filtering is of potential use in a wide range of speech processing applications. As the process of voice production is, to a first order approximation, a source-filter process, then obtaining source and filter components provides for a flexible representation of the speech signal for use in processing applications. In certain applications the desire for accurate inverse filterin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010