Lip2AudSpec: Speech reconstruction from silent lip movements video

نویسندگان

  • Hassan Akbari
  • Himani Arora
  • Liangliang Cao
  • Nima Mesgarani
چکیده

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98% correlation and also improves the quality of reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with superior word recognition accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Articulatory Strategies for Lip and Tongue Movements in Silent versus Vocalized Speech

In the context of Silent Speech Communication (SSC) development after total laryngectomy rehabilitation, tongue and lip movements were recorded with a portable ultrasound transducer and a CCD video camera respectively. A list of 60 French minimal-pairs and a list of 50 most frequent French words were pronounced in vocalized and silent mode by one speaker. Amplitude and timing of the articulator...

متن کامل

The fusion of visual lip movements and mixed speech signals for robust speech separation

A technique for the early fusion of visual lip movements and a vector of mixed speech signals is proposed. This technique involves the initial recreation of speech signals entirely from the visual lip motions of each speaker. By using geometric parameters of the lips obtained from the Tulips1 database and the Audio-Visual Speech Processing dataset, a virtual speech signal is recreated by using ...

متن کامل

Early speech motor development: Cognitive and linguistic considerations.

UNLABELLED This longitudinal investigation examines developmental changes in orofacial movements occurring during the early stages of communication development. The goals were to identify developmental trends in early speech motor performance and to determine how these trends differ across orofacial behaviors thought to vary in cognitive and linguistic demands (i.e., silent spontaneous movement...

متن کامل

Block-Based Motion Estimation Analysis for Lip Reading User Authentication Systems

This paper proposes a lip reading technique for speech recognition by using motion estimation analysis. The method described in this paper represents a sub-system of the Silent Pass project. Silent Pass is a lip reading password entry system for security applications. It presents a user authentication system based on password lip reading. Motion estimation is done for lip movement image sequenc...

متن کامل

Statistical Mapping Between Articulatory and Acoustic Data for an Ultrasound-Based Silent Speech Interface

This paper presents recent developments on our “silent speech interface” that converts tongue and lip motions, captured by ultrasound and video imaging, into audible speech. In our previous studies, the mapping between the observed articulatory movements and the resulting speech sound was achieved using a unit selection approach. We investigate here the use of statistical mapping techniques, ba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1710.09798  شماره 

صفحات  -

تاریخ انتشار 2017