Estimation of speech lip features from discrete cosinus transform
نویسندگان
چکیده
This study is a contribution to the field of visual speech processing. It focuses on the automatic extraction of Speech lip features from natural lips. The method is based on the direct prediction of these features from predictors derived from an adequate transformation of the pixels of the lip region of interest. The transformation is made of a 2-D Discrete Cosine Transform combined with a Principal Component Analysis applied to a subset of the DCT coefficients corresponding to about 1% of the total DCTs. The results show the possibility to estimate the geometric lip features with a good accuracy (a root mean square of 1 to 1.4 mm for the lip aperture and the lip width) using a reduce set of predictors derived from the PCA.
منابع مشابه
Speaker - Independent Visual Lip Activity Detection for Human - Computer Interaction
Recently there is an increased interest in using the visual features for improved speech processing. Lip reading plays a vital role in visual speech processing. In this paper, a new approach for lip reading is presented. Visual speech recognition is applied in mobile phone applications, human-computer interaction and also to recognize the spoken words of hearing impaired persons. The visual spe...
متن کاملGender classification via lips: static and dynamic features
Automatic gender classification has many security and commercial applications. Various modalities have been investigated for gender classification with face-based classification being the most popular. In some real-world scenarios the face may be partially occluded. In these circumstances a classification based on individual parts of the face known as local features must be adopted. We investig...
متن کاملAudio-Visual Speech Processing System for Polish with Dynamic Bayesian Network Models
In this paper we describe a speech processing system for Polish which utilizes both acoustic and visual features and is based on Dynamic Bayesian Network (DBN) models. Visual modality extracts information from speaker lip movements and is based alternatively on raw pixels and discrete cosine transform (DCT) or Active Appearance Model (AAM) features. Acoustic modality is enhanced by using two pa...
متن کاملAutomatic Lip Reading for Daily Indonesian Words Based on Frame Difference and Horizontal-vertical Image Projection
Automatic lip reading is one of research being developed lately. Automatic lip reading has been used for various purposes, such as enhancing speech recognition and aid to speech training for the deaf. There are two approaches in lip feature extraction, namely appearance based and shape based. Appearance based approach is usually better, because it provides visual features that cover not only li...
متن کاملNoise-invariant representation for speech signals
A new group-delay based spectral domain is explored for representation of speech signals and for extraction of robust features. The spectrum is computed using the group-delay functions defined on the autocorrelation of a short segment of speech. The features derived from this spectrum are easy to compute and are robust to the background noise. The invariance of the spectral shape to noise in th...
متن کامل