mfcc

Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems

2012

Marc René Schädler Birger Kollmeier

Physiologically motivated feature extraction methods based on 2D-Gabor filters have already been used successfully in robust automatic speech recognition (ASR) systems. Recently it was shown that a Mel Frequency Cepstral Coefficients (MFCC) baseline can be improved with physiologically motivated features extracted by a 2D-Gabor filter bank (GBFB). Besides physiologically inspired approaches to ...

متن کامل

A Comparative Study Of LPCC And MFCC Features For The Recognition Of Assamese Phonemes

2013

Utpal Bhattacharjee

In this paper two popular feature extraction techniques Linear Predictive Cepstral Coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC) have been investigated and their performances have been evaluated for the recognition of Assamese phonemes. A multilayer perceptron based baseline phoneme recognizer has been built and all the experiments have been carried out using that recognize...

متن کامل

Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015

2015

Md. Jahangir Alam Patrick Kenny Gautam Bhattacharya Themos Stafylakis

The automatic speaker verification spoofing and countermeasures challenge 2015 provides a common framework for the evaluation of spoofing countermeasures or anti-spoofing techniques in the presence of various seen and unseen spoofing attacks. This contribution proposes a system consisting of amplitude, phase, linear prediction residual, and combined amplitude phase-based countermeasures for the...

متن کامل

Human beatbox sound recognition using an automatic speech recognition toolkit

Journal: :Biomedical Signal Processing and Control 2021

Human beatboxing is a vocal art making use of speech organs to produce drum sounds and imitate musical instruments. Beatbox sound classification current challenge that can be used for automatic database annotation music-information retrieval. In this study, large-vocabulary human-beatbox recognition system was developed with an adaptation Kaldi toolbox, widely-used tool recognition. The corpus ...

متن کامل

On combining information from modulation spectra and mel-frequency cepstral coefficients for automatic detection of pathological voices.

Journal: :Logopedics, phoniatrics, vocology 2011

Julián David Arias-Londoño Juan I Godino-Llorente Maria Markaki Yannis Stylianou

This work presents a novel approach for the automatic detection of pathological voices based on fusing the information extracted by means of mel-frequency cepstral coefficients (MFCC) and features derived from the modulation spectra (MS). The system proposed uses a two-stepped classification scheme. First, the MFCC and MS features were used to feed two different and independent classifiers; and...

متن کامل

Comparative study of GMM, DTW, and ANN on Thai speaker identification system

2000

Chularat Tanprasert Varin Achariyakulporn

This paper proposes a new investigation on Gaussian mixture model (GMM) by comparing it with some preliminary experiments on multilayered perceptron network (MLP) with backpropagation learning algorithm (BKP) and dynamic time warping (DTW) techniques on Thai text-dependent speaker identification system. Three major identification engines are conducted on 50 speakers with isolated digits 0-9. Tr...

متن کامل

Entropy based combination of tandem representations for noise robust ASR

2004

Shajith Ikbal Hemant Misra Sunil Sivadas Hynek Hermansky Hervé Bourlard

In this paper, we present an entropy based method to combine tandem representations of the recently proposed Phase AutoCorrelation (PAC) based features and MelFrequency Cepstral Coefficients (MFCC) features. PAC based features, derived from a nonlinear transformation of autocorrelation coefficients and shown to be noise robust, improve their robustness to additive noise in their tandem represen...

متن کامل

Integrating Complementary Features with a Confidence Measure for Speaker Identification

2006

Nengheng Zheng Pak-Chung Ching Ning Wang Tan Lee

This paper investigates the effectiveness of integrating complementary acoustic features for improved speaker identification performance. The complementary contributions of two acoustic features, i.e. the conventional vocal tract related features MFCC and the recently proposed vocal source related features WOCOR, for speaker identification are studied. An integrating system, which performs a sc...

متن کامل

PERCEPTUAL TIME−VARYING MODELLING OF SPEECH SIGNALS FOR ASR COMPRESSION APPLICATION (MonAmOR3)

2005

Amir Leibman Ilan D. Shallom

Perceptual audio coders and Automatic Speech Recognition (ASR) systems are commonly based on short−time analysis. This paper presents a generalized model for time−varying coefficients based on psychoacoustic properties of the human ear. The proposed model is evaluated in the framework of speaker independent speech recognition using Hidden Markov Models (HMM). The generalized model is compared t...

متن کامل

On Feature Selection for Speaker Verification

2002

Arnon Cohen Yaniv Zigel

This paper describes an HMM based speaker verification system, which verifies speakers in their own specific feature space. This ‘individual’ feature space is determined by a Dynamic Programming (DP) feature selection algorithm. A suitable criterion, correlated with Equal Error Rate (EER) was developed and is used for this feature selection algorithm. The algorithm was evaluated on a text-depen...

متن کامل