ضرایب mfcc

Cepstral domain voice activity detection for improved noise modeling in MMSE feature enhancement for ASR

2008

Svein Gunnar Pettersen Magne Hallstein Johnsen

In this paper we investigate the use of voice activity detection (VAD) for improving noise models used for cepstral domain minimum mean squared error (MMSE) filtering of noisy speech. Due to the popularity of MFCC features for speech recognition, it is useful to have VAD methods and MMSE filtering algorithms that both work in the MFCC domain. We propose a method for VAD based on the likelihood ...

متن کامل

Konuşma Tanima İçi̇n Heteroskedasti̇k Ayirtaç Anali̇zi̇ni̇n Düzenli̇leşti̇ri̇lmesi̇ Regularizing Heteroschedastic Discriminant Analysis for Speech Recognition

2005

Hakan Erdoğan

Linear Discriminant Analysis (LDA) followed by a diagonalizing maximum likelihood linear transform (MLLT) applied to spliced static MFCC features yields important performance gains as compared to MFCC+dynamic features in most speech recognition tasks. It is reasonable to regularize LDA transform computation for stability. In this paper, we regularize LDA and heteroschedastic LDA transforms usin...

متن کامل

Formant trajectories for acoustic-to-articulatory inversion

2009

I. Yücel Özbek Mark Hasegawa-Johnson Mübeccel Demirekler

This work examines the utility of formant frequencies and their energies in acoustic-to-articulatory inversion. For this purpose, formant frequencies and formant spectral amplitudes are automatically estimated from audio, and are treated as observations for the purpose of estimating electromagnetic articulography (EMA) coil positions. A mixture Gaussian regression model with mel-frequency cepst...

متن کامل

Comparative performance analysis of statistical trajectory models in cellular environment

1997

Bojan Petek Ove Andersen Paul Dalsgaard

Two systems (Statistical Trajectory Models (STM) and continuous density HMMs) utilizing three preprocessing methodologies (MFCC, RASTA and FBDYN) were evaluated on two databases, namely CTIMIT and the corresponding downsampled TIMIT. Within the bounds of the experimental setup the comparative performance analysis showed that the STM significantly outperforms the HMM system on the CTIMIT databas...

متن کامل

Time and Frequency Filtering for Speech Recognition in Real Noise Conditions

2001

Dušan Macho Climent Nadeu Javier Hernando Jaume Padrell

MFCCs perform well when used for clean speech recognition. However, for noisy speech the recognition rates go down. Augmenting the MFCC feature vector by dynamic features improves both discrimination and robustness of the MFCC-based recognizer. In this paper, we present an alternative para meterization based on the frequency filtering (FF) technique. By using FF, a significant improvement with ...

متن کامل

Beginning of utterance detection algorithm for low complexity ASR engines

2004

Tommi Lahti

In this paper, a novel method for beginning of utterance detection is proposed for low complexity ASR systems. Assuming MFCC calculations in the ASR front-end, the additional computational load due to the algorithm is negligible. The algorithm makes use of the delay between the MFCC calculation and decoding process, which is typical in front-ends with feature normalization. The main steps of th...

متن کامل

Multi - Devices Hindi Speech Database for Speaker Identification using GMM

2013

Sonu Kumar Mahesh Chandra

Abstract— In this paper, we study the effect on speaker identification (SI) system when speech data is recorded on two different sensors, a HP Pavilion third generation laptop and a Samsung mobile ( S3770K) both with built-in microphone in parallel in a closed room in noise free environment. The database contains 10 Hindi sentences (50-60 seconds speech) and one english sentence (7-8 seconds sp...

متن کامل

Improvement of Text Dependent Speaker Identification System Using Neuro-Genetic Hybrid Algorithm in Office Environmental Conditions

Journal: :CoRR 2009

Md. Rabiul Islam Md. Fayzur Rahman

In this paper, an improved strategy for automated text dependent speaker identification system has been proposed in noisy environment. The identification process incorporates the NeuroGenetic hybrid algorithm with cepstral based features. To remove the background noise from the source utterance, wiener filter has been used. Different speech pre-processing techniques such as start-end point dete...

متن کامل

Robustness to additive noise of locally-normalized cepstral coefficients in speaker verification

2015

Josué Fredes José Novoa Víctor Poblete Simon King Richard M. Stern Néstor Becerra Yoma

In this paper the performance of a new feature set, Locally Normalized Cepstral Coefficients (LNCC) is evaluated for a speaker verification task with short testing utterances in additive noise. The results presented here show that LNCC outperforms baseline MFCC features when SNR is lower than 15 dB. The average relative reduction in EER achieved by LNCC is 33%. The use of LNCC in combination wi...

متن کامل

Distributed Speech Recognition Usin Traps-estimated Manne

2002

Pratibha Jain Brian Kingsbury

In this paper, we investigate the use of TemPoRal PatternS (TRAPS) classifiers for estimating manner of articulation features on the small-vocabulary Aurora-2002 database. By combining a stream of TRAPS-estimated manner features with a stream of noise-robust MFCC features (earlier proposed in the Aurora-2002 evaluation by OGI, ICSI and Qualcomm), we obtain an average absolute improvement of 0.4...

متن کامل