speech feature extraction

Carlos Avendano , Sangita Tibrewala and Hynek Hermansky , " Multiresolution Channel Normalization for ASR in Reverberant

1997

Carlos Avendano Sangita Tibrewala

To overcome the problems related with the long impulse responses produced by reverberation, we use a long time window (high frequency resolution) analysis during the channel normalization steps of the feature extraction process in automatic speech recognition (ASR). After nor-malization, a trade between frequency and time resolution is used to increase the rate at which the time information is ...

متن کامل

Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips

2007

Thomas Hueber Gérard Chollet Bruce Denby Gérard Dreyfus Maureen Stone

The article describes a video-only speech recognition system for a “silent speech interface” application, using ultrasound and optical images of the voice organ. A one-hour audiovisual speech corpus was phonetically labeled using an automatic speech alignment procedure and robust visual feature extraction techniques. HMM-based stochastic models were estimated separately on the visual and acoust...

متن کامل

A Feature Study for Masking-Based Reverberant Speech Separation

2016

Masood Delfarah DeLiang Wang

Monaural speech separation in reverberant conditions is very challenging. In masking-based separation, features extracted from speech mixtures are employed to predict a time-frequency mask. Robust feature extraction is crucial for the performance of supervised speech separation in adverse acoustic environments. Using objective speech intelligibility as the metric, we investigate a wide variety ...

متن کامل

Analysis and Optimization of Bottleneck Features for Speaker Recognition

2016

Alicia Lozano-Diez Anna Silnova Pavel Matějka Ondřej Glembek Oldřich Plchot Jan Pešán Lukáš Burget Joaquin Gonzalez-Rodriguez

Recently, Deep Neural Network (DNN) based bottleneck features proved to be very effective in i-vector based speaker recognition. However, the bottleneck feature extraction is usually fully optimized for speech rather than speaker recognition task. In this paper, we explore whether DNNs suboptimal for speech recognition can provide better bottleneck features for speaker recognition. We experimen...

متن کامل

Intentional voice command detection for completely hands-free speech interface in home environments

2008

Yasunari Obuchi Masahito Togami Takashi Sumiyoshi

We introduce a new class of speech processing, called Intentional Voice Command Detection (IVCD). It is necessary to reject not only noises but also unintended voices to achieve completely hands-free speech interface. Conventional VAD framework is not sufficient for such purpose, and we discuss how we should define IVCD and how we can realize it. We investigate implementation of IVCD from the v...

متن کامل

SPEECH RECOGNITION for VOICE BASED CONTROL

2002

Y. S. Naous G. F. Choueiter M. I. Ohannessian M. A. Al-Alaoui

In this paper, we describe a typical approach for implementing a voice based control solution. Isolated word speech recognition is performed using cepstral feature extraction and hidden Markov modeling of speech. The merit of this document lies in the amalgamation of the simplest yet most successful relevant methods into a coherent design guideline, aiming to trivialize the integration of speec...

متن کامل

Closed-loop Auditory-based Representation for Robust Speech Recognition

1985

Chia-ying Lee James R. Glass Oded Ghitza Hung-an Chang Ekapol Chuangsuwanich Yuan Shen Stephen Shum Yaodong Zhang

A closed-loop auditory based speech feature extraction algorithm is presented to address the problem of unseen noise for robust speech recognition. This closed-loop model is inspired by the possible role of the medial olivocochlear (MOC) efferent system of the human auditory periphery, which has been suggested in [6, 13, 42] to be important for human speech intelligibility in noisy environment....

متن کامل

IDIAP Martigny - Valais - Suisse Continuous Audio � Visual Speech Recognition

1998

Juergen Luettin

We address the problem of robust lip tracking, visual speech feature extraction, and sensor integration for audiovisual speech recognition applications. An appearance based model of the articulators, which represents linguistically important features, is learned from example images and is used to locate, track, and recover visual speech information. We t a c kle the problem of joint temporal mo...

متن کامل

Automatic Segmentation of Wave File

2010

Nishi Sharma Parminder Singh

This paper presents an ASS (Automatic Speech Segmentation) Technique to segment spontaneous speech into syllable like units. In the development of a syllable-centric ASS system, segmentation of the acoustic signal into syllabic units is an important stage. In this paper we focus on the identifying minimum unit of speech to be considered while training any speech recognition system. There are sy...

متن کامل

A three-dimensional approach to Visual Speech Recognition using Discrete Cosine Transforms

Journal: :CoRR 2016

Toni Heidenreich Michael W. Spratling

Visual speech recognition aims to identify the sequence of phonemes from continuous speech. Unlike the traditional approach of using 2D image feature extraction methods to derive features of each video frame separately, this paper proposes a new approach using a 3D (spatio-temporal) Discrete Cosine Transform to extract features of each feasible sub-sequence of an input video which are subsequen...

متن کامل