From text to formants - indirect model for trajectory prediction based on a multi-speaker parallel speech database
نویسندگان
چکیده
An indirect model is presented, capable of estimating formant trajectories from text only (Text-to-Formants, TTF). The result is a phonetically correct formant trajectory flow of any virtual speech signal, i.e. one that has never been uttered. The focus is on the pattern forms inside the given sound, taking into account the sound environment (up to quinphone), and not on individual formant value measurements. The model is based on a multi-speaker parallel speech database with precise manual corrections and a HMM-based formant trajectory predictor. The validation of the TTF model shows that formant trajectories can be predicted with good accuracy from text. The model indirectly gives information about a theoretically possible articulation flow of the sentence. Thus it gives a general ‘formantprint’ of the language.
منابع مشابه
A Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کاملFormant Tracking Linear Prediction Model using HMMs for Noisy Speech Processing
This paper presents a formant-tracking linear prediction (FTLP) model for speech processing in noise. The main focus of this work is the detection of formant trajectory based on Hidden Markov Models (HMM), for improved formant estimation in noise. The approach proposed in this paper provides a systematic framework for modelling and utilization of a timesequence of peaks which satisfies continui...
متن کاملThe Use of Group Delay Features of Linear Prediction Model for Speaker Recognition
New text independent speaker identification method is presented. Phase spectrum of allpole linear prediction (LP) model is used to derive the speech features. The features are represented by pairs of numbers that are calculated from group delay extremums of LP model spectrum. The first component of the pair is an argument of maximum of group delay of all pole LP model spectrum and the second is...
متن کاملRecognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model
Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....
متن کاملComparative Analysis of Formants of British, American and Asutralian Accents
This paper compares and quantifies the differences between formants of speech across accents. The cross entropy information measure is used to compare the differences between the formants of the vowels of three major English accents namely British, American and Australian. An improved formant estimation method, based on a linear prediction (LP) model feature analysis and a hidden Markov model (...
متن کامل