Which Phoneme-to-Viseme Maps Best Improve Visual-Only Computer Lip-Reading?
نویسندگان
چکیده
A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappings and consider if any are stable across talkers. We show a method for devising maps based on phoneme confusions from an automated lip-reading system, and we present new mappings that show improvements for individual talkers.
منابع مشابه
Decoding visemes: improving machine lipreading (PhD thesis)
This thesis is about improving machine lip-reading, that is, the classification of speech from only visual cues of a speaker. Machine lip-reading is a niche research problem in both areas of speech processing and computer vision. Current challenges for machine lip-reading fall into two groups: the content of the video, such as the rate at which a person is speaking or; the parameters of the vid...
متن کاملSpeaker-independent machine lip-reading with speaker-dependent viseme classifiers
In machine lip-reading, which is identification of speech from visual-only information, there is evidence to show that visual speech is highly dependent upon the speaker [1]. Here, we use a phoneme-clustering method to form new phoneme-to-viseme maps for both individual and multiple speakers. We use these maps to examine how similarly speakers talk visually. We conclude that broadly speaking, s...
متن کاملFinding phonemes: improving machine lip-reading
In machine lip-reading there is continued debate and research around the correct classes to be used for recognition. In this paper we use a structured approach for devising speaker-dependent viseme classes, which enables the creation of a set of phoneme-to-viseme maps where each has a different quantity of visemes ranging from two to 45. Viseme classes are based upon the mapping of articulated ...
متن کاملPrimary research on the viseme system in Standard Chinese
The study of traditional phonetics indicates the shape of lips takes important effect on the articulations of consonants and vowels. [1]. AVSP (Audio-Visual Speech Processing) can improve the naturalness of synthetical speech and recognition rate of the speech recognition system. Especially in computer-synthesized face, the movements of lip-shape play a crucial role. The present research aims t...
متن کاملPersian Viseme Classification Using Interlaced Derivative Patterns and Support Vector Machine
Viseme (Visual Phoneme) classification and analysis in every language are among the most important preliminaries for conducting various multimedia researches such as talking head, lip reading, lip synchronization, and computer assisted pronunciation training applications. With respect to the fact that analyzing visemes is a language dependent process, we concentrated our research on Persian lan...
متن کامل