Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform
نویسندگان
چکیده
We hear vowels pronounced by men and women as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that the auditory system can extract and separate information about the size of the vocal-tract from information about its shape. The duration of the impulse response of the vocal tract expands or contracts as the length of the vocal tract increases or decreases. There is a transform, the Mellin transform, that is immune to the eects of time dilation; it maps impulse responses that dier in temporal scale onto a single distribution and encodes the size information separately as a scalar constant. In this paper we investigate the use of the Mellin transform for vowel normalisation. In the auditory system, sounds are initially subjected to a form of wavelet analysis in the cochlea and then, in each frequency channel, the repeating patterns produced by periodic sounds appear to be stabilised by a form of time-interval calculation. The result is like a two-dimensional array of interval histograms and it is referred to as an auditory image. In this paper, we show that there is a two-dimensional form of the Mellin transform that can convert the auditory images of vowel sounds from vocal tracts with dierent sizes into an invariant Mellin image (MI) and, thereby, facilitate the extraction and separation of the size and shape information associated with a given vowel type. In signal processing terms, the MI of a sound is the Mellin transform of a stabilised wavelet transform of the sound. We suggest that the MI provides a good model of auditory vowel normalisation, and that this provides a good framework for auditory processing from cochlea to cortex. Ó 2000 Elsevier Science B.V. All rights reserved.
منابع مشابه
Stabilised wavelet mellin transform: an auditory strategy for normalising sound-source size
We hear phonemes pronounced by men, women and children as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that we extract and separate the size and shape information of sound sources. The impulse response of the vocal tract is compressed or expanded in time when the length o...
متن کاملExtracting Size and Shape Information of Sound Source in an Optimal Auditory Processing Model
We hear phonemes pronounced by men, women and children as approximately the same although the length of the vocal tract varies considerably from group to group. At the same time, we can identify the speaker group. This suggests that we extract and separate the size and shape information of sound sources. The impulse response of the vocal tract is compressed or expanded in time when the length o...
متن کاملAn Auditory Vocoder Resynthesis of Speech from an Auditory Mellin Representation
An auditory Mellin transform has been proposed to segregate information about the size and shape of the vocal tract automatically; the process is also independent of glottal pitch. In this paper, we describe a method for resynthesizing speech from the Mellin representation using a high quality vocoder (STRAIGHT), and a nonlinear function to map between the two representations of speech. This en...
متن کاملGENERAL SOLUTION OF ELASTICITY PROBLEMS IN TWO DIMENSIONAL POLAR COORDINATES USING MELLIN TRANSFORM
Abstract In this work, the Mellin transform method was used to obtain solutions for the stress field components in two dimensional (2D) elasticity problems in terms of plane polar coordinates. the Mellin transformation was applied to the biharmonic stress compatibility equation expressed in terms of the Airy stress potential function, and the boundary value problem transformed to an algebraic ...
متن کاملThe perception of scale in vowels
The resonating properties of many objects provide acoustical correlates which can be used to gain information about the objects. The acoustic signal provides not only shape information (what the sound means) but also size information (how small/big the object is relative to the population). A signal processing algorithm able to isolate both shape and size information is the Mellin transform. It...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Speech Communication
دوره 36 شماره
صفحات -
تاریخ انتشار 2002