Pitch Contour Stylization Using a Tonal Perception Model

نویسنده

  • P. Mertens
چکیده

An algorithm for automatic pitch contour stylization is described. It is based on a model of tonal perception, such that the resulting stylization is controlled by two perceptual thresholds. The output is the sequence of audible pitch events aligned with the syllables in the utterance. INTRODUCTION Stylization is a manual or automatic procedure that modifies the measured F0 contour of an utterance into a simplified but functionally equivalent form, i.e. preserving all melodic information which has a function in speech communication. There are several motivations for doing this; for instance, to reduce the amount of data required to generate pitch contours in synthetic speech, or to isolate the functional parts in the contour (and to remove the others) and to obtain a representation of this underlying contour, e.g. for intonation teaching or linguistic research. We propose an approach in which the stylization represents the pitch contour which is perceived by the average listener. In other words, the stylization process is seen as a simulation of tonal perception and as a way to measure what is heard. This approach satisfies both goals mentioned earlier: it results in an important data reduction and it filters out F0 events which cannot be heard and hence have no function in prosody. Still, there are other motivations for computing the perceived pitch: as a way to evaluate phonological intonation models and to obtain an automatic transcription of intonation. This may require some explanation. There is little doubt about the acoustic manifestation of intonation (at least, for pitch): for most speech signals, F0 can be computed in an objective way, with estimation errors below perceptual thresholds. The phonological representation, however, is a sequence of symbols (tones, pitch movements) the determination of which involves a phonetician who interprets the data within a particular model. The large number of such models suggests the lack of a procedure to evaluate them. How could one decide that the descriptive units of model A are more viable than those of model B, or even that they are psychologically viable at all ? To this date, there is no clear criterion for the verification of intonation models; as a result their choice is often a matter of personal preference. When someone describes the sounds he hears, he refers to the auditory image resulting from sensory and perceptual processing, rather than to the acoustic signal. It can easily be seen that the cognitive process of intonation understanding does not have direct access to the acoustic form (F0) but rather to the pitch events after processing by the peripheral auditory system and the perceptual system. Consequently it is this form that should be the input to the phonological model. By computing this perceived pitch contour, one will narrow the gap between the acoustic and the cognitive domain, because it eliminates one of the assumptions made by phonological models (namely the one about the nature of the input representation). The rest of this paper gives a quick overview of tonal perception effects, then describes the algorithm implementing them. Finally we describe some of the results obtained. Tonal perception What is known about tonal perception? 1. Spectral and amplitude variations in the speech signal affect the way in which pitch variations are perceived, giving rise to a perceptual segmentation of the speech continuum [5]. This segmentation effect results in a sequence of short tonal events aligned with the syllables, rather than a continuous pitch curve for the whole utterance. Unfortunately, no quantitative model describing the contribution of changes in (global) amplitude, spectral energy, and other attributes, is yet available. 2. The perception of a changing pitch requires some minimal amount of frequency change as a function of time. Otherwise a static pitch is perceived. This effect is known as the glissando threshold (G). For a uniform pitch change (with constant slope), G = 0.16 / T2, where T is the duration of the pitch variation. The effect has been investigated for years [3,4,8,9], both for pure tones and synthetic speech, but not in continuous speech. 3. A change in pitch slope will be perceived provided some minimal difference in slope, known as the differential glissando threshold (DG). There has been little research on this effect [4]. 4. Static tones, i.e. short-term F0 variations which are below threshold G, are perceived with a certain pitch. In a study on the perception of vibrato [1], it was shown that this short-term integration can be modelled by a windowed time average (WTA) function. Our stylization algorithm simulates these four effects. DESCRIPTION OF THE STYLIZATION ALGORITHM Figure 1 shows a block diagram of the stylization algorithm. It consists of several processing steps, some of which are purely acoustic (F0 measurement, voicing determination), while others are related to tonal perception. We will focus on the latter here. Most of the work in the algorithm concerns the determination of the speech fragments for which the perceptual effects are to be computed. 1. Speech segmentation. Since pitch perception is determined by spectral and amplitude changes, the speech signal is first divided into syllable-sized chunks. In the absence of a quantitative model of this effect, several types of segmentation are investigated. The first focusses on spectral change and uses the voiced parts of the syllables [2]; the second favours amplitude change and computes the syllabic nuclei (or loudness peaks) [6]. We will illustrate the results obtained with both segmentations. 2. Short-term perceptual integration of pitch. The WTA model is applied to the F0 in the voiced region of each syllable. This results in a smoothed pitch contour (as can be seen in Figure 2). 3. Syllabic contour segmentation. Syllabic pitch contours can be compound (e.g. rise-fall); they should be divided into simple, uniform parts first. This results in one or more tonal segments per syllable: a monotonous pitch change, i.e. either level, rising, or falling, and without an audible change in slope. This segmentation is motivated by the fact that the G and the DG are obtained for, and should therefore be applied to such uniform segments. The syllabic contour segmentation involves two steps. The first locates the turning points in the contour so as to break it down into candidate tonal segments. The second makes a decision as to which candidate segments are to be grouped. The first step is recursive. Within an analysis interval with an audible pitch change (above G), a new turning point is found at the point of Figure 1. Block diagram of the pitch contour stylization algorithm. PDA is the pitch determination algorithm. V/UV is the voicing decision. speech

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic pitch contour stylization using a model of tonal perception

A new quantitative model of tonal perception for continuous speech is described. The paper illustrates its ability for automatic stylization of pitch contours, with applications to prosodic analysis and speech synthesis in mind, and evaluates it in a perception experiment. After a discussion of the psychoacoustics of tonal perception and an overview of existing tonal perception models and syste...

متن کامل

The Prosogram: Semi-Automatic Transcription of Prosody Based on a Tonal Perception Model

This paper describes a system for semi-automatic transcription of prosody based on a stylization of the fundamental frequency data (contour) for vocalic (or syllabic) nuclei. The stylization is a simulation of tonal perception of human listeners. The system requires a time-aligned phonetic annotation. The transcription has been applied to several speech corpora.

متن کامل

Perception of Pitch and Tonal Timing: Implications for Mechanisms of Tonogenesis

This paper presents a discussion of tonal categories relating to pitch perception discrimination thresholds and the differences between pitch perception and the perception of tonal timing. Based on the results of a perception experiment using Swedish listeners and tonal timing results from the literature, this paper proposes perceptual mechanisms which may help explain the development of contou...

متن کامل

The effect of global F0 contour shape on the perception of tonal timing contrasts in American English intonation

Results from an ABX perception task involving the contrast between defaultand late-timed pitch accents ((L+)H* and L*+H) in American English intonation demonstrate that pitch movement curvature, in addition to turning-point alignment, plays a role in determining listener categorization. A model based on Tonal Center of Gravity, effectively integrating both F0 turning-point and global contour-sh...

متن کامل

Effects of Pitch Contours Stylization and Time Scale Modification on Natural Speech Synthesis

This paper describes the method of generation of intonated speech for natural speech synthesis using prosody generation model. The effect of pitch modification through pitch contour stylization for parameter extraction and time scale modification for it’s implementation has been mentioned. An approach for close-copy syllabic stylization has been described. In the latter part, algorithm for impl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995