Segmentation of a speech waveform according to glottal open and closed phases using an autoregressive-HMM

نویسندگان

  • Gavin Smith
  • Tony Robinson
چکیده

This paper presents an algorithm to segment speech according to glottal open and closed phases using the time waveform alone. Based on this, pitch, jitter and closed to open glottal ratios can be computed. Segmentation is achieved by identifying spectral changepoints at the subpitch period timescale. Changepoints are identi ed using a 3-state autoregressive hidden Markov model (AR-HMM) operating on the time waveform, with the LiljencrantsFant (LF) glottal model as a theoretical basis. Model parameters and optimal state sequence are determined respectively using the expectation-maximisation (EM) algorithm and a bounded state duration (BSD) Viterbi algorithm. Experiments on synthetic speech give encouraging glottal segmentation for modal, fry and breathy voice types. Experiments on real speech obtained from TIMIT give meaningful segmentations also.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Time-varying autoregressive tests for multiscale speech analysis

In this paper we develop hypothesis tests for speech waveform nonstationarity based on time-varying autoregressive models, and demonstrate their efficacy in speech analysis tasks at both segmental and sub-segmental scales. Key to the successful synthesis of these ideas is our employment of a generalized likelihood ratio testing framework tailored to autoregressive coefficient evolutions suitabl...

متن کامل

Accuracy evaluation of esophageal voice analysis based on automatic topology generated-voicing source HMM

An Auto-Regressive eXogenous (ARX) model combined with descriptive models of the glottal source waveform has been adopted to more accurately separate the vocal tract and the voicing source. However, these methods cannot be easily applied to the analysis of voices uttered by different speech production methods, such as esophageal voice. We previously proposed the Voicing Source Hidden Markov Mod...

متن کامل

Voiced Speech Synthesis Using Pitch Asynchronous Code Excited Linear Filters for the Glottal Source

This paper proposes a model for natural quality voiced speech synthesis using code excited linear all-pole filter for modeling the glottal source signal. Classical glottal signal models are explicit-time functions which inhibit joint sourcetract parameter estimation and require pitch synchronous estimation with precise segmentation of open and closed glottis phase. These problems are overcome i...

متن کامل

Glottal closure and opening detection for flexible parametric voice coding

The knowledge of glottal closure and opening instants (GCI/GOI) is useful for many speech analysis applications. A Pitchsynchronous waveform encoding of voice is one such application. In this paper, a dynamic programming is employed to solve for the global close/open phase segmentation based on the polynomial parametric waveform of the derivative glottal waveform and its quasi-periodicity. Not ...

متن کامل

On the Inverse Filtering of Speech ( MSc . Thesis ) George P . Kafentzis Heraklion October 2010

ON THE INVERSE FILTERING OF SPEECH George P. Kafentzis Computer Science Department Master of Science In all proposed source-filter models of speech production, Inverse Filtering (IF) is a well known technique for obtaining the glottal flow waveform, which acts as the source in the vocal tract system. The estimation of glottal flow is of high interest in a variety of speech areas, such as voice ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000