Estimation of speech lip features from discrete cosinus transform

نویسندگان

Zuheng Ming

Denis Beautemps

Gang Feng

Sébastien Schmerber

چکیده

This study is a contribution to the field of visual speech processing. It focuses on the automatic extraction of Speech lip features from natural lips. The method is based on the direct prediction of these features from predictors derived from an adequate transformation of the pixels of the lip region of interest. The transformation is made of a 2-D Discrete Cosine Transform combined with a Principal Component Analysis applied to a subset of the DCT coefficients corresponding to about 1% of the total DCTs. The results show the possibility to estimate the geometric lip features with a good accuracy (a root mean square of 1 to 1.4 mm for the lip aperture and the lip width) using a reduce set of predictors derived from the PCA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker - Independent Visual Lip Activity Detection for Human - Computer Interaction

Recently there is an increased interest in using the visual features for improved speech processing. Lip reading plays a vital role in visual speech processing. In this paper, a new approach for lip reading is presented. Visual speech recognition is applied in mobile phone applications, human-computer interaction and also to recognize the spoken words of hearing impaired persons. The visual spe...

متن کامل

Gender classification via lips: static and dynamic features

Automatic gender classification has many security and commercial applications. Various modalities have been investigated for gender classification with face-based classification being the most popular. In some real-world scenarios the face may be partially occluded. In these circumstances a classification based on individual parts of the face known as local features must be adopted. We investig...

متن کامل

Audio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models

In this paper we describe a speech processing system for Polish which utilizes both acoustic and visual features and is based on Dynamic Bayesian Network (DBN) models. Visual modality extracts information from speaker lip movements and is based alternatively on raw pixels and discrete cosine transform (DCT) or Active Appearance Model (AAM) features. Acoustic modality is enhanced by using two pa...

متن کامل

Automatic Lip Reading for Daily Indonesian Words Based on Frame Difference and Horizontal-vertical Image Projection

Automatic lip reading is one of research being developed lately. Automatic lip reading has been used for various purposes, such as enhancing speech recognition and aid to speech training for the deaf. There are two approaches in lip feature extraction, namely appearance based and shape based. Appearance based approach is usually better, because it provides visual features that cover not only li...

متن کامل

Noise-invariant representation for speech signals

A new group-delay based spectral domain is explored for representation of speech signals and for extraction of robust features. The spectrum is computed using the group-delay functions defined on the autocorrelation of a short segment of speech. The features derived from this spectrum are easy to compute and are robust to the background noise. The invariance of the spectral shape to noise in th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Estimation of speech lip features from discrete cosinus transform

نویسندگان

چکیده

منابع مشابه

Speaker - Independent Visual Lip Activity Detection for Human - Computer Interaction

Gender classification via lips: static and dynamic features

Audio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models

Automatic Lip Reading for Daily Indonesian Words Based on Frame Difference and Horizontal-vertical Image Projection

Noise-invariant representation for speech signals

عنوان ژورنال:

اشتراک گذاری