Audio-visual Interaction in Model Adaptation for Multi-modal Speech Recognition
نویسندگان
چکیده
This paper investigates audio-visual interaction, i.e. inter-modal influences, in linear-regressive model adaptation for multi-modal speech recognition. In the multi-modal adaptation, inter-modal information may contribute the performance of speech recognition. Thus the influence and advantage of intermodal elements should be examined. Experiments were conducted to evaluate several transformation matrices including or excluding inter-modal and intra-modal elements, using noisy data in an audio-visual corpus. From the experimental results, the importance of effective use of audio-visual interaction is clarified.
منابع مشابه
Stream weight estimation using higher order statistics in multi-modal speech recognition
In this paper, stream weight optimization for multi-modal speech recognition using audio information and visual information is examined. In a conventional multi-stream Hidden Markov Model (HMM) used in multi-modal speech recognition, a constraint in which the summation of audio and visual weight factors should be one is employed. This means balance between transition and observation probabiliti...
متن کاملA Robust Multi-modal Speech Recognition Method Using Optical-flow Analysis
This paper proposes a new multi-modal speech recognition method using optical-flow analysis, evaluating its robustness to acoustic and visual noises. Optical flow is defined as the distribution of apparent velocities in the movement of brightness patterns in an image. Since the optical flow is computed without extracting speaker’s lip contours and location, robust visual features can be obtaine...
متن کاملMulti-Source Human ID
Motivation: Research in audio-visual speech recognition, eg. [1, 5] , shows that classification results can be improved by integrating evidence from multiple sources of observation. Similar results have been observed and demonstrated in verification systems, such as [6], as well as in multi-modal trackers, eg. [7]. In the area of human identification current state of the art systems deliver ver...
متن کاملAn audio-visual corpus for multimodal speech recognition in dutch language
This paper describes the gathering and availability of an audio-visual speech corpus for Dutch language. The corpus was prepared with the multi-modal speech recognition in mind and it is currently used in our research on lip-reading and bimodal speech recognition. It contains the prompts used also in the well established POLYPHONE corpus and therefore captures the Dutch language characteristics...
متن کاملSpeaker adaptation for audio-visual speech recognition
In this paper, speaker adaptation is investigated for audiovisual automatic speech recognition (ASR) using the multistream hidden Markov model (HMM). First, audio-only and visual-only HMM parameters are adapted by combining maximum a posteriori and maximum likelihood linear regression adaptation. Subsequently, the audio-visual HMM stream exponents are adapted to better capture the reliability o...
متن کامل