Speech recognition using localized affine invariant features

نویسندگان

Masayuki SUZUKI

Yu QIAO

Nobuaki MINEMATSU

Keikichi HIROSE

چکیده

This paper proposes localized affine invariant features (LAIFs) for speaker-independent automatic speech recognition. The LAIFs can be calculated directly from data sequences. As speaker variations can be approximated well by affine transform in a cepstral space, the LAIFs can provide robust features with respect to those variations. This fact inspires us to expect that the use of the LAIFs should improve the recognition performance especially when no training data is available for speaker normalization or adaptation. To verify this expectation, we apply LAIFs for isolated word recognition. The experimental results show that the combination of LAIFs with MFCC or MFCC+∆MFCC can lead to higher performances than MFCC or MFCC+∆MFCC only. Especially in mismatched conditions, MFCC+∆MFCC+LAIFs can reduce the error rates by 37% when compared to MFCC+∆MFCC only.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lip-reading from parametric lip contours for audio- visual speech recognition

This paper describes the incorporation of a visual lip tracking and lip-reading algorithm that utilizes the affine-invariant Fourier descriptors from parametric lip contours to improve the audio-visual speech recognition systems. The audio-visual speech recognition system presented here uses parallel hidden Markov models (HMMs), where a joint decision, using an optimal decision rule, is made af...

متن کامل

Classification of emotional speech using spectral pattern features

Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...

متن کامل

Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features

Viewpoint invariant pedestrian recognition is an important yet under-addressed problem in computer vision. This is likely due to the difficulty in matching two objects with unknown viewpoint and pose. This paper presents a method of performing viewpoint invariant pedestrian recognition using an efficiently and intelligently designed object representation, the ensemble of localized features (ELF...

متن کامل

Viewpoint Unconstrained Face Recognition Based on Affine Local Descriptors and Probabilistic Similarity

Face recognition under controlled settings, such as limited viewpoint and illumination change, can achieve good performance nowadays. However, real world application for face recognition is still challenging. In this paper, we propose using the combination of Affine Scale Invariant Feature Transform (SIFT) and Probabilistic Similarity for face recognition under a large viewpoint change. Affine ...

متن کامل

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Speech recognition using localized affine invariant features

نویسندگان

چکیده

منابع مشابه

Lip-reading from parametric lip contours for audio- visual speech recognition

Classification of emotional speech using spectral pattern features

Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features

Viewpoint Unconstrained Face Recognition Based on Affine Local Descriptors and Probabilistic Similarity

Speech Emotion Recognition Using Scalogram Based Deep Structure

عنوان ژورنال:

اشتراک گذاری