cepstral

An Utterance Recognition Technique for Keyword Spotting by Fusion of Bark Energy and MFCC Features

2009

K. GOPALAN TAO CHU XIAOFENG MIAO

This paper describes the preliminary results of a keyword spotting system using a fusion of spectral and cepstral features. Spectral energy in 16 bands of frequencies on Bark scale and 16 mel-scale warped cepstral coefficients are used independently and in combination with appropriate weights for recognizing word utterances. Results of matching features using Euclidean and cosine distances in a...

متن کامل

Stereo Vision Head Vergence using GPU Cepstral Filtering

2011

Luís Almeida Paulo Menezes Jorge Dias

Vergence ability is an important visual behavior observed on living creatures when they use vision to interact with the environment. The notion of active observer is equally useful for robotic vision systems on tasks like object tracking, fixation and 3D environment structure recovery. Humanoid robotics are a potential playground for such behaviors. This paper describes the implementation of a ...

متن کامل

Extended powered cepstral normalization (p-CN) with range equalization for robust features in speech recognition

2007

Chang-Wen Hsu Lin-Shan Lee

Cepstral normalization has been popularly used as a powerful approach to produce robust features for speech recognition. A new approach of Powered Cepstral Normalization (P-CN) was recently proposed to normalize the MFCC parameters in the r1-th order powered domain, where r1 > 1.0, and then transform the features back by an 1/r2 power order to a better recognition domain, and it was shown to pr...

متن کامل

Speaker Normalization with All-pass Transforms Center for Language and Speech Processing 72 Speaker Normalization with All-pass Transforms

1998

John W. McDonough Alan V. Oppenheim Philip E. Gill Walter Murray John R. Deller John G. Proakis

Speaker normalization is a process in which the short-time features of speech from a given speaker are transformed so as to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency axis of the short-time spectrum associated with a particular speaker’s speech is rescaled or warped prior to the extraction ...

متن کامل

Language and genre detection in audio content analysis

2008

Vikramjit Mitra Daniel Garcia-Romero Carol Y. Espy-Wilson

This paper presents an audio genre detection framework that can be used for a multi-language audio corpus. Cepstral coefficients are considered and analyzed as the feature set for both a language dependent and language independent genre identification (GID) task. Language information is found to increase the overall detection accuracy on an average by at least 2.6% from its language independent...

متن کامل

Rhythm based music segmentation and octave scale cepstral features for sung language recognition

2008

Namunu Chinthaka Maddage Haizhou Li

Sung language recognition relies on both effective feature extraction and acoustic modeling. In this paper, we study rhythm based music segmentation in which the frame size varies in proportion to inter-beat interval of the music, in contrast to fixed length segmentation (FIX) in spoken language recognition. We show that acoustic feature extracted from the BSS scheme outperforms that from FIX. ...

متن کامل

Speech Recognition of Phones Using Feature Streams

2007

Todd A. Stephenson

Artiicial neural networks (ANNs) have been used to classify phonetic features in speech. The feature streams from the ANNs are used here as the observations for Hidden Markov Models (HMMs). Using such observations allows us to build a competitive speech recogniser. This recogniser is compared to a similar recogniser that was trained on mel-frequency cepstral coeecients (MFCCs). While the cepstr...

متن کامل

A multiresolutionally oriented approach for determination of cepstral features in speech recognition

1997

Simon Dobrisek France Mihelic Nikola Pavesic

This paper presents an effort to provide a more efficient speech signal representation, which aims to be incorporated into an automatic speech recognition system. Modified cepstral coefficients, derived from a multiresolution auditory spectrum are proposed. The multiresolution spectrum was obtained using sliding single point discrete Fourier transformations. It is shown that the obtained spectr...

متن کامل

Application of shifted delta cepstral features in speaker verification

2007

José Ramón Calvo de Lara Rafael Fernández Gabriel Hernández

Recently, Shifted Delta Cepstral (SDC) feature was reported to produce superior performance to the delta and delta-delta features in cepstral feature based language identification (LID) systems [1, 2]. This paper examines the application of SDC features in speaker verification and evaluates its robustness to channel mismatch, manner of speaking and session variability. The result of the experim...

متن کامل

A fast approach to psychoacoustic model compensation for robust speaker recognition in additive noise

2015

Ashish Panda

This paper addresses the problem of speaker verification in the presence of additive noise. We propose a fast implementation of Psychoacoustic Model Compensation (Psy-Comp) scheme for static features along with model domain mean and variance normalization for robust speaker recognition in noisy conditions. The proposed algorithms are validated through experiments on noise corrupted NIST-2000 sp...

متن کامل