Speaker-independent machine lip-reading with speaker-dependent viseme classifiers
نویسندگان
چکیده
In machine lip-reading, which is identification of speech from visual-only information, there is evidence to show that visual speech is highly dependent upon the speaker [1]. Here, we use a phoneme-clustering method to form new phoneme-to-viseme maps for both individual and multiple speakers. We use these maps to examine how similarly speakers talk visually. We conclude that broadly speaking, speakers have the same repertoire of mouth gestures, where they differ is in the use of the gestures.
منابع مشابه
Finding phonemes: improving machine lip-reading
In machine lip-reading there is continued debate and research around the correct classes to be used for recognition. In this paper we use a structured approach for devising speaker-dependent viseme classes, which enables the creation of a set of phoneme-to-viseme maps where each has a different quantity of visemes ranging from two to 45. Viseme classes are based upon the mapping of articulated ...
متن کاملDecoding visemes: improving machine lipreading (PhD thesis)
This thesis is about improving machine lip-reading, that is, the classification of speech from only visual cues of a speaker. Machine lip-reading is a niche research problem in both areas of speech processing and computer vision. Current challenges for machine lip-reading fall into two groups: the content of the video, such as the rate at which a person is speaking or; the parameters of the vid...
متن کاملLip Localization and Viseme Recognition from Video Sequences
Viseme (visual cue) recognition is one of the steps to be followed in building an automated lip-reading system. In order to recognize a viseme, one has to first detect the lips of the speaker from the video sequences and track them to extract the feature vectors for the final recognition. A novel method for liplocalization based on the color models has been proposed. Also, the basic possible li...
متن کاملThe challenge of multispeaker lip-reading
In speech recognition, the problem of speaker variability has been well studied. Common approaches to dealing with it include normalising for a speaker’s vocal tract length and learning a linear transform that moves the speaker-independent models closer to to a new speaker. In pure lip-reading (no audio) the problem has been less well studied. Results are often presented that are based on speak...
متن کاملComparison of human and machine-based lip-reading
We investigate the performance of a machine-based lip-reading system using both shape-only parameters and full shape and appearance parameters. Furthermore, we contrast the performance of a machine-based lip-reading system with human lip-reading ability. We find that the automated system outperforms human lip-readers. Curiously however, for relatively simple tasks there is little improvement in...
متن کامل