Extracting an AV speech source f

نویسندگان

  • David Sodoyer
  • Laurent Girin
  • Jean-Luc Schwartz
چکیده

We present a new approach to the source separation problem for multiple speech signals. Using the extra visual information of the face speaker, the method aims to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker’s lip movements. We define a statistical model of the joint probability of visual and spectral audio input for quantifying the audio-visual coherence. Then, separation can be achieved by maximising this joint probability. Experiments on additive mixtures of 2, 3 and 5 sources show that the algorithm performs well, and systematically better than the classical BSS algorithm JADE.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Title of Dissertation : CORTICAL DYNAMICS OF AUDITORY - VISUAL SPEECH : A FORWARD MODEL OF MULTISENSORY INTEGRATION

Title of Dissertation: CORTICAL DYNAMICS OF AUDITORYVISUAL SPEECH: A FORWARD MODEL OF MULTISENSORY INTEGRATION. Virginie van Wassenhove, Ph.D., 2004 Dissertation Directed By: David Poeppel, Ph.D., Department of Linguistics, Department of Biology, Neuroscience and Cognitive Science Program In noisy settings, seeing the interlocutor’s face helps to disambiguate what is being said. For this to hap...

متن کامل

Adaptive Estimation of Time-varying F Speech Based on an Excitat

This paper describes a method of extracting time-varying features that is effective for speech signals with high fundamental frequencies. The proposed method adopts a speech production model that consists of a Time-Varying AutoRegressive (TVAR) process for an articulatory filter and a Hidden Markov Model (HMM) for an excitation source. The model represents waveform amplitude variations by timev...

متن کامل

Audio-visual speech fragment decoding

This paper presents a robust speech recognition technique called audio-visual speech fragment decoding (AV-SFD), in which the visual signal is exploited both as a cue for source separation and as a carrier of phonetic information. The model builds on the existing audio-only SFD technique which, based on the auditory scene analysis account of perceptual organisation, works by combining a bottom-...

متن کامل

On timing in time-frequency analysis of speech signals

The objective of this paper is to demonstrate the importance of position of the analysis time window in time-frequency analysis of speech signals. Speech signals contain information about the time varying characteristics of the excitation source and the vocal tract system. Resolution in both the temporal and spectral domains is essential for extracting the source and system characteristics from...

متن کامل

Noise alters beta-band activity in superior temporal cortex during audiovisual speech processing

Speech recognition is improved when complementary visual information is available, especially under noisy acoustic conditions. Functional neuroimaging studies have suggested that the superior temporal sulcus (STS) plays an important role for this improvement. The spectrotemporal dynamics underlying audiovisual speech processing in the STS, and how these dynamics are affected by auditory noise, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003