Duration modification using glottal closure instants and vowel onset points
نویسندگان
چکیده
This paper proposes a method for duration (time scale) modification using Glottal Closure Instants (GCI, also known as instants of significant excitation) and Vowel Onset Points (VOP). In general, most of the time scale modification methods attempt to vary the duration of speech segments uniformly over all regions. But it is observed that consonant regions and transition regions between a consonant and the following vowel, and between two consonant regions do not vary appreciably with speaking rate. The proposed method implements the duration modification without changing the durations of the transition and consonant regions. Vowel onset points are used to identify the transition and consonant regions. A VOP is the instant at which the onset of the vowel takes place, which corresponds to the transition from a consonant to the following vowel in most cases. The VOPs are computed using the Hilbert envelope of Linear Prediction (LP) residual. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations, like the onset of burst, in the case of nonvoiced speech. Manipulation of duration is achieved by modifying the duration of the LP residual with the help of instants of significant excitation as pitch markers. The modified residual is used to excite the time-varying filter whose parameters are derived from the original speech signal. Perceptual quality of the synthesized speech is found to be natural. Performance of the proposed method is compared with the method, where the duration of speech is modified uniformly over all regions. Samples of speech signals for different modification factors is available for listening at http://sit.iitkgp.ernet.in/∼ksrao/result.html K. Sreenivasa Rao is with the School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur 721302, West Bengal, India. E-mail: [email protected] B. Yegnanarayana is with the International Institute of Information Technology (IIIT), Gachibowli, Hyderabad 500032, Andhra Pradesh, India. Email: [email protected]
منابع مشابه
Vowel onset point detection for noisy speech using spectral energy at formant frequencies
In this paper, we propose a method for robust detection of the vowel onset points (VOPs) from noisy speech. The proposed VOP detection method exploits the spectral energy at formant frequencies of the speech segments present in glottal closure region. In this work, formants are extracted by using group delay function, and glottal closure instants are extracted by using zero frequency filter bas...
متن کاملProsodic manipulation using instants of significant excitation
This paper proposes a technique for prosodic (pitch and duration) manipulation using instants of significant excitation. Instants of significant excitation correspond to the instants of glottal closure (epochs) in voiced speech and to some random excitations like burst onset in the case of nonvoiced speech. Instants of significant excitation are computed from the average group delay of minimum ...
متن کاملSignificance of instants of significant excitation for source modeling
The objective of this work is to demonstrate the significance of instants of significant excitation for source modeling. Instants of significant excitation correspond to the glottal closure, glottal opening, onset of burst, frication and a small number of excitation instants around them. The speech signal is processed independently by zero frequency filtering (ZFF) to obtain epochs. The epochs ...
متن کاملEmotion conversion using Feedforward Neural Networks
An emotion is made of several components such as physiological changes in the body, subjective feelings, and expressive behaviours. These changes in speech signal are mainly observed in prosody parameters such as pitch, duration and energy. In this work, prosody parameters are modified using instants of significant excitation (epochs) and these instants are detected using Zero Frequency Filteri...
متن کاملAutomatic pitch marking and reconstruction of glottal closure instants from noisy and deformed electro-glotto-graph signals
Pitch tracking and pitch marking (PM) are two important speech signal analysis techniques for several applications. The accuracy of both pitch marking and tracking is significant to generate smooth synthesized speech by controlling the pitch and duration of voiced speech in Text-to-Speech (TTS) system for example. In this paper, we present a novel hybrid approach, combining electro-glotto-graph...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Speech Communication
دوره 51 شماره
صفحات -
تاریخ انتشار 2009