Session 5aSCb: Production and Perception II: The Speech Segment (Poster Session) 5aSCb33. Automating phonetic measurement: The case of voice onset time

نویسندگان

  • Neville Ryant
  • Jiahong Yuan
چکیده

We present an architecture for locating phonetic events accurately in time, and for measuring time differences between nearby events, using Voice Onset Time (VOT) as a case study. Although VOT remains a central concern in the field, phoneticians' VOT measurements generally continue to rely on human judgment. This requires significant labor, makes even large laboratory experiments onerous, and prevents the field from taking full advantage of the millions of hours of digital speech now becoming available. Our algorithm accurately automates VOT measurement, by combining HMM forced alignment for determining approximate stop boundaries with paired burst and voicing onset detectors. Each detector is a frame-level max margin classifier operating on the scale-space projection of a small number of relevant acoustic features. On a large set of clean lab speech, this system has a mean absolute error (relative to human annotation) of only 2.8 ms, with 98% of errors <10 ms. On a subcorpus independently annotated by two of the authors, the system agreed with the two human annotators as well as they agreed with one another (1.49 ms vs 1.50 ms). Promising results on other datasets will be reported. The system will be released as opensource software.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Communication Session 5aSCb: Production and Perception II: The Speech Segment (Poster Session) 5aSCb27. Pharyngeal constriction in English diphthong production

This study tests the hypothesis that the acoustic difference between [a] in English diphthongs (e.g. [a] in "pie'd") and its corresponding monophthong (e.g. [a] in "pod") results from the same pharyngeal gesture being truncated by the following palatal glide in the diphthongal environment. Production data were collected with real-time MRI and have been analyzed using the direct image analysis (...

متن کامل

Speech Communication Session 5aSCb: Production and Perception II: The Speech Segment (Poster Session) 5aSCb53. One small step for (a) man: Function word reduction and acoustic ambiguity

"That's one small step for man, one giant leap for mankind." Neil Armstrong insisted for years that his famous quote upon landing on the moon was misheard, and that he had said "one small step for a man." This controversy has continued, as examinations of the sound files of his transmission have yielded mixed opinions about whether he produced a. The disagreement stems partly from the fact that...

متن کامل

Speech Communication Session 5aSCb: Production and Perception II: The Speech Segment (Poster Session) 5aSCb39. On distinguishing articulatory configurations and articulatory tasks: Tamil retroflex consonants

Speech production can be described in multiple coordinate frames: articulatory configurations, gestural tasks, and acoustic patterns. Examination of the achievement of retroflex stops and liquids in Tamil suggests that we must consider separately the gestural task of apical post-alveolar constriction and the articulatory maneuver to achieve the task. The maneuver of the tongue during retroflex ...

متن کامل

Speech Communication Session 5aSCb: Production and Perception II: The Speech Segment (Poster Session) 5aSCb15. Articulatory overlap in English syllables with postvocalic /ɹ/

​ *Corresponding author's address: Linguistics, University of Southern California, 3601 Watt Way, GFS 301, Los Angeles, CA 90089-1693, [email protected] In General American English (GAE), only two full vowels [ɑ, ɔ] occur in syllables ending in [ɹ] plus a non-coronal consonant, e.g. , . An articulatory study of rhotic production by three speakers of GAE was conducted using real-time s...

متن کامل

Speech Communication Session 5aSCb: Production and Perception II: The Speech Segment (Poster Session) 5aSCb47. A comparative cross-linguistic study of vocal tract shaping in sibilant fricatives in English, Serbian and Mandarin using real-time magnetic resonance imaging

An articulatory study of sibilant fricatives is described, with the goal of describing variability in lingual articulation across languages. Realtime Magnetic Resonance Imaging (rtMRI) data were collected from three speakers each of English and Mandarin and two speakers of Serbian and reconstructed at a rate of 22.4 frames per second. Parallel acoustic data were also collected and subsequently ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013