nonlinear modeling of speech poincare sections in combination with frequency analysis to improve speech recognition performance

نویسندگان

ayuob jafari

farshad almasganj

maryam nabi bidhendi

چکیده

this paper introduces a novel approach to improve performance of speech recognition systems using a combination of features obtained from speech reconstructed phase space (rps) and frequency domain analysis. by choosing an appropriate value for the dimension, reconstructed phase space is assured to be topologically equivalent to the dynamics of the speech production system, and could therefore include information that may be absent in analysis approaches based on linear methods. also, complicated systems such as speech production system can present cyclic and oscillatory patterns and poincare sections could be used as an effective tool in analysis of such trajectories. in this research, a statistical modeling approach based on gaussian mixture models (gmm) was applied to the poincare sections of speech rps. the final feature set is obtained from a feature selection stage omong parameters of gmm model and the usual mel frequency cepstral coefficients (mfcc). an hmm-based speech recognition system and the timit speech database are used to evaluate performance of the proposed feature extraction system for isolated and continuous speech recognition. experiments represent about 5.7% absolute isolated phoneme recognition accuracy improvement in isolated phoneme recognition performance. the new approach is shown to be a viable and effective alternative to traditional feature extraction methods, particularly for signals such as speech with strong nonlinear characteristics.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance.

This paper introduces a combinational feature extraction approach to improve speech recognition systems. The main idea is to simultaneously benefit from some features obtained from Poincaré section applied to speech reconstructed phase space (RPS) and typical Mel frequency cepstral coefficients (MFCCs) which have a proved role in speech recognition field. With an appropriate dimension, the reco...

متن کامل

the analysis of the role of the speech acts theory in translating and dubbing hollywood films

از محوری ترین اثراتی که یک فیلم سینمایی ایجاد می کند دیالوگ هایی است که هنرپیش گان فیلم میگویند. به زعم یک فیلم ساز, یک شیوه متأثر نمودن مخاطب از اثر منظوره نیروی گفتارهای گوینده, مثل نیروی عاطفی, ترس آور, غم انگیز, هیجان انگیز و غیره, است. این مطالعه به بررسی این مسأله مبادرت کرده است که آیا نیروی فراگفتاری هنرپیش گان به مثابه ی اعمال گفتاری در پنج فیلم هالیوودی در نسخه های دوبله شده باز تولید...

15 صفحه اول

Modeling Speech Repairs and Intonational Phrasing to Improve Speech Recognition

The spontaneous speech events of speech repairs and intonational phrasing cause disruptions in the local context, and this disruption prevents traditional language models from being able to properly predict the words in the vicinity of these events. The solution is to use a language model that can account for these spontaneous speech events. In this paper, we use such a model to rescore word gr...

متن کامل

Using semantic analysis to improve speech recognition performance

Although syntactic structure has been used in recent work in language modeling, there has not been much effort in using semantic analysis for language models. In this study, we propose three new language modeling techniques that use semantic analysis for spoken dialog systems. We call these methods concept sequence modeling, two-level semantic-lexical modeling, and joint semantic-lexical modeli...

متن کامل

An Introduction to Speech Sciences (Acoustic Analysis of Speech)

Speech sciences deal with the acoustical characteristics of speech by means of sophisticated soft wares as well as hard wares. Although, a speech science is a well known science in the developed countries, especially the western societies, however, it has been remained almost unknown in Iran, though, in recent years a group of scholars have been involved in this branch of science. The applicati...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید


عنوان ژورنال:
the modares journal of electrical engineering

ناشر: tarbiat modares university

ISSN 2228-527 X

دوره 10

شماره 3 2011

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023