Proposal of structure-to-speech conversion and its application to implementation of infants’ vocal imitation

نویسندگان

  • Daisuke SAITO
  • Satoshi ASAKAWA
  • Nobuaki MINEMATSU
  • Keikichi HIROSE
چکیده

Speech acoustics vary due to differences in age, gender, vocal tract length, microphone, and so on. The authors recently proposed a structural and abstract representation of speech, where these variations were effectively removed. This representation captures only dynamics of speech. In our previous study, using this abstract representation, a new framework of speech synthesis was proposed and some fundamental investigations were carried out. In this new framework, an utterance is modeled by two separate attributes; one corresponding to what is known as speech Gestalt, which is speaker-invariant, and the other to the embodiment seen in vocal tubes, which characterizes speaker differences. Acoustic signals are generated by using the Gestalt as constraint conditions and the vocal tube embodiment as initial conditions. In other words, the Gestalt can be acoustically realized only when the speaker’s embodiment is considered. This new framework can be regarded as an implementation of infants’ vocal imitation. In this study, by following the initial investigations, we improve accuracy and efficiency in acoustic realization of the Gestalt by using an analytical method. Experiments of generating continuous utterances of Japanese vowels show the validity of the proposed method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structure to speech conversion - speech generation based on infant-like vocal imitation

This paper proposes a new framework of speech generation by imitating “infants’ vocal imitation”. Most of the speech synthesizers take a phoneme sequence as input and generate speech by converting each of the phonemes into a sound sequentially. In other words, they simulate a human process of reading text out. However, infants usually acquire speech generation ability without text or phoneme se...

متن کامل

Optimal event search using a structural cost function - improvement of structure to speech conversion

This paper describes a new and improved method for the framework of structure to speech conversion we previously proposed. Most of the speech synthesizers take a phoneme sequence as input and generate speech by converting each of the phonemes into its corresponding sound. In other words, they simulate a human process of reading text out. However, infants usually acquire speech communication abi...

متن کامل

Physical Interpretation of Word Gestalt Based on Invariant Representation of Speech and Its Application to Implement of Infants' Vocal Imitation * 1

In this paper we propose a new framework of speech generation by imitating “infants’ vocal imiation”. Most of the speech synthesizers take a phoneme sequence as input and generate speech by converting each of the phonemes. However infants usually acquire speech generation ability without text or phoneme. As developmental psychology states, from the utterances of their parents, they acquire the ...

متن کامل

Human Speech Model Based on Information Separation — Collection

Abstract: This paper points out that no existing technically-implemented speech model is adequate enough to describe one of the most fundamental and unique capacities of human speech processing. Language acquisition of infants is based on vocal imitation [1] but they don’t impersonate their parents and imitate only the linguistic and para-linguistic aspects of the parents’ utterances. The vocal...

متن کامل

Human Speech Model Based on Information Separation — Collection or Separation, That is the Question. —

— Collection or Separation, That is the Question. — Nobuaki Minematsu Graduate School of Information Science and Technology, The University of Tokyo [email protected] Abstract This paper points out that no existing technically-implemented speech model is adequate enough to describe one of the most fundamental and unique capacities of human speech processing. Language acquisition of infa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008