Accepted Manuscript Improvement to a Nam-captured Whisper-to-speech System Improvement to a Nam-captured Whisper-to-speech System

نویسندگان

  • Gérard Bailly
  • Hélène Lœvenbruck
  • Tomoki Toda
چکیده

Exploiting a tissue-conductive sensor – a stethoscopic microphone – the system developed at NAIST which converts Non-Audible Murmur (NAM) to audible speech by GMM-based statistical mapping is a very promising technique. The quality of the converted speech is however still insufficient for computer-mediated communication, notably because of the poor estimation of F0 from unvoiced speech and because of impoverished phonetic contrasts. This paper presents our investigations to improve the intelligibility and naturalness of the synthesized speech and first objective and subjective evaluations of the resulting system. The first improvement concerns voicing and F0 estimation. Instead of using a single GMM for both, we estimate a continuous F0 using a GMM, trained on target voiced segments only. The continuous F0 estimation is filtered by a voicing decision computed by a neural network. The objective and subjective improvement is significant. The second improvement concerns the input time window and its dimensionality reduction: we show that the precision of F0 estimation is also significantly improved by extending the input time window from 90 to 450ms and by using a Linear Discriminant Analysis (LDA) instead of the original Principal Component Analysis (PCA). Estimation of spectral envelope is also slightly improved with LDA but is degraded with larger time windows. A third improvement consists in adding visual parameters both as input and output parameters. The positive contribution

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement to a NAM captured whisper-to-speech system

Exploiting a tissue-conductive sensor – a stethoscopic microphone – the system developed at NAIST which converts Non-Audible Murmur (NAM) to audible speech by GMM-based statistical mapping is a very promising technique. The quality of the converted speech is however still insufficient for computer-mediated communication, notably because of the poor estimation of F0 from unvoiced speech and beca...

متن کامل

Improving body transmitted unvoiced speech with statistical voice conversion

The conversion method from Non-Audible Murmur (NAM) to ordinary speech based on the statistical voice conversion (NAM-toSpeech) has been proposed towards realization of “silent speech telephone.” Although NAM-to-Speech converts NAM to intelligible voices with similar quality to speech, there is still a large problem, i.e., difficulties of the F0 estimation from unvoiced speech. In order to avoi...

متن کامل

Predicting F0 and voicing from NAM-captured whispered speech

The NAM-to-speech conversion proposed by Toda and colleagues which converts Non-Audible Murmur (NAM) to audible speech by statistical mapping trained using aligned corpora is a very promising technique, but its performance is still insufficient, mainly due to the difficulty in estimating F0 of the transformed voice from unvoiced speech. In this paper, we propose a method to improve F0 estimatio...

متن کامل

Speaker identification for whispered speech based on frequency warping and score competition

In certain situations, talkers will intentionally use whisper instead of neutral speech for the sake of privacy or confidentiality, which severely degrades the performance of speaker identification systems trained with only neutral speech. There are considerable differences in the spectral structure between whisper and neutral speech due to an absence of voice harmonic excitation. This study in...

متن کامل

Speaker identification for whispered speech using modified temporal patterns and MFCCs

Speech production variability due to whisper represents a major challenges for effective speech systems. Whisper is used by talkers intentionally in certain circumstances to protect personal privacy. Due to the absence of periodic excitation in the production of whisper, there are considerable differences between neutral and whispered speech in the spectral structure. Therefore, performance of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009