Phase Transitions in a Repetitive Speech Task as Gestural Recomposition

نویسندگان

  • Kenneth de Jong
  • Byung-jin Lim
  • Kyoko Nagao
چکیده

This paper examines glottal movement data in rate-controlled repetitions of CV (onset consonant + vowel) and VC (vowel + coda consonant) structures. It replicates Tuller & Kelso’s (1991) observation of rate-induced phase transitions in glottal behavior. Specifically, for voiceless consonants peak glottal opening for CV’s are similar at fast and slow rates. The timing for VC’s at slow rates differs from that of CV’s, but changes at fast rates to become like CV’s. Additional aspects of rate-induced changes in glottal behavior were also found, however. For 'voiced' consonants, CV’s show the same change in glottal timing as do voiceless post-vocalic VC’s. In addition, apparent phase transitions may involve both the timing and magnitude of the gestures and as well as which glottal gestures the speakers implement. Hence, VC phase transitions also seem to involve a change in gestural composition, rather than just a re-coordination of the same gestures. *Work supported by the NIDCD (grant #R03 DC04095-01A2) and by the NSF (grant #BCS-9910701). We also express deep appreciation to Anders Lofqvist, without whose seemingly endless patience and remarkable expertise, none of this work could have been done. We also acknowledge Mafuyu Kitahara and Bushra Zawaydeh for their help on this project. This paper was presented as paper 2aSC11 at the 142nd meeting of the Acoustical Society of America, the abstract of which appears in the Journal of the Acoustical Society of America, 110: 2657. de Jong, Lim & Nagao 2 Introduction. Stetson (1951) reports a number of articulatory studies, several of which involve repetition of various syllabic constructions. There, Stetson observes that simple postvocalic consonants, such as the 'b' in 'eeb', will shift in apparent affiliation when repeated at fast speech rates. Thus, while a slow train of repeated 'eeb eeb eeb' is heard as a train of 'eeb's at a slow rate of repetition, as the repetition rate is increased, there comes a point at which it is heard as a train of 'bee's. Tuller & Kelso (1991) replicated this experimental result with naive subjects. They asked speakers to repeatedly produce a syllable including only a pre-vocalic or a post-vocalic stop at a comfortable speaking rate, and then asked the speakers to increase the rate of repetition in a stepwise fashion. When pieces of the overall train were played to listeners for identification, naive listeners identified the fast rate repetitions as being CV forms, rather than VC's. In addition, Tuller & Kelso examined glottal transillumination data and measurements of lip movement. They found that a measure of relative timing, proportion of the way through the lip opening cycle at which peak glottal opening occurred (termed 'glottal phase') seemed to mirror the listeners' identifications. Glottal phase for VC's was systematically earlier than for CV's at slow rates, but, after proceeding through a period where phase tended to come even earlier, tended to shift to values similar to CV's at fast rates. CV’s had fixed timing at 40 degrees (one-ninth of a 360 degree cycle defined by successive oral closings) after peak oral closure, regardless of repetition rate. VC’s start off at about 20 degrees after peak closure, go through a period where phasing approaches 0 degrees, and then converge on a timing of about 40 degrees after peak closure, roughly the same timing as that found in CV's. Tuller & Kelso then proposed that syllabic organization is the result of a bistable dynamical system whose behavior can be neatly captured by the collective variable, glottal phase. de Jong (2001a, 2001b) examined acoustic records of four speakers of American English repeating voiced and voiceless labial stops in either CV ('pea' or 'bee') forms or VC ('eep' or 'eeb') structures. Repetition rate was controlled explicitly by having the talkers entrain to various metronome patterns which specified various rates and changed them in various ways. Results of these studies were what is expected from the previous studies; repeated VC's become similar to CV's as rates increased. However, de Jong de Jong, Lim & Nagao 3 (2001a) also found that, even though CV's and VC's are both usually identified as CV's, they are systematically different even at the fastest rates. Coupled with Tuller & Kelso's model, these results suggested that, even if the temporal coordination of VC's does switch to match that of CV's, there are clearly-detectable differences in the production of the utterances which are retained. In addition, de Jong (2001b) examined the durational structure of the repeated syllables directly and found that the rate changes affect the temporal structure of VC’s and CV’s in different ways. Also, the phonemic voicing of the consonant restricts how VC temporal structure is changed. Hence, while glottal to oral phasing may index CV-VC differences, there is much more to the differences between the two structures. The current study seeks to replicate Tuller & Kelso's (1991) results with respect to the relative timing of glottal actions in rate-varied repetitions of VC and CV forms. In addition to examining glottal phase, it includes a broader qualitative study in order to determine how glottal opening and closing movements correspond to the resulting acoustics. Relating glottal data to the acoustics turns out to be particularly important, since glottal actions in VC's are more complex than is suggested by characterizing them in terms of the relative timing of glottal and oral gestures. It also reveals a number of fairly complicated changes which accompany the shift in glottal and oral timing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مقایسه روش‌های مختلف یادگیری ماشین در خلاصه‌سازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت

In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...

متن کامل

The Effect of Comprehensible Input and Comprehensible Output on the Accuracy and Complexity of Iranian EFL Learners’ Oral Speech

This study aimed at investigating the relative impact of comprehensible input and comprehensible output on the development of grammatical accuracy and syntactic complexity of Iranian EFL learners’ oral production. Participants were 60 female EFL learners selected from a whole population pool of 80 based on the standard test of IELTS. To investigate the research questions, the participants were ...

متن کامل

A procedure for estimating gestural scores from natural speech

Speech can be represented as a constellation of constricting events, gestures, which are defined at distinct vocal tract sites, in the form of a gestural score. Gestures and their output trajectories, tract variables, which are available only in synthetic speech, have recently been shown to improve automatic speech recognition (ASR) performance paper we propose an iterative analysis-by-synthesi...

متن کامل

A procedure for estimating gestural scores from speech acoustics.

Speech can be represented as a constellation of constricting vocal tract actions called gestures, whose temporal patterning with respect to one another is expressed in a gestural score. Current speech datasets do not come with gestural annotation and no formal gestural annotation procedure exists at present. This paper describes an iterative analysis-by-synthesis landmark-based time-warping arc...

متن کامل

Estimation of articulatory gesture patterns from speech acoustics

We investigated dynamic programming (DP) and statemodel (SM) approaches for estimating gestural scores from speech acoustics. We performed a word-identification task using the gestural pattern vector sequences estimated by each approach. For a set of 75 randomly chosen words, we obtained the best word-identification accuracy (66.67%) using the DP approach. This result implies that considerable ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003