Sinewave Speech Perception in a Ci User

نویسندگان

  • Winston D. Goh
  • David B. Pisoni
  • Karen I. Kirk
  • Robert E. Remez
  • Stacey Yount
  • Lorin Lachs
چکیده

We investigated a post-lingually defeaned cochlear implant user’s ability to perceive sinewave replicas of spoken sentences. The patient, Mr. S, transcribed sinewave sentences under audio-only (AO), visual-only (VO), and audio-visual (A+V) conditions. His performance was compared to the data collected from a group of normal-hearing participants in an earlier study by R.E. Remez, J.M. Fellowes, D.B. Pisoni, W.D. Goh, and P.E. Rubin (1998). Results showed that Mr. S derived a larger gain from additional visual information provided by the talker’s face than the normal-hearing controls. The increase in performance under A+V presentation reflected the superior lip-reading skills that this patient displayed and his ability to use this skill to integrate the information provided by the talking face and the sinewave speech to perceive the underlying sentence. Implications of these findings for multimodal phonetic coherence in speech perception are discussed. It has long been known that combining audio and visual information facilitates the perception of speech. In their pioneering study, Sumby and Pollack (1954) demonstrated that the intelligibility of spoken words can be enhanced by as much as +15 dB in noisy environments if listeners are able to see the talker’s face. This is a substantial gain in performance that surpasses even the best hearing aid devices. Using visual information from a dynamically articulating face for phonetic and lexical identification is a skill which almost everyone will benefit from when listening in noisy environments, especially when people get older and their hearing deteriorates (Summerfield, 1987). For the hearing-impaired population, the visual route may play an especially major role in speech perception. Some of the speech cues for consonants that are difficult to hear are easy to see and vice-versa (Walden, Prosek, Montgomery, Scherr, & Jones, 1975). For example, /f/ and /T/ are auditorily confusable, but they are very distinct visually when the talker’s articulatory movements can be seen. The enormous gain from seeing the talker’s face is eloquently captured by a question frequently asked of hearing-aid practitioners – “Doctor, why can I understand you so much more clearly when I wear my glasses?” (Summerfield, 1987). Research into the nature of audio-visual integration in speech perception can therefore provide substantial insights and applications for rehabilitative procedures, techniques, and training methods to assist the hearing impaired. The study of multimodal speech perception also raises many important theoretical issues about the scope and domain of current models of speech perception and spoken language processing (see Berstein, Demorest, & Tucker, in press; Massaro, 1998). The absolute gain in performance observed from the visual aspects of speech is highest in a noisy environment or in other conditions that make auditory perception difficult (Sumby & Pollack, 1954). Therefore, the best way to observe the influence of visual information is to look at identification performance with impoverished auditory stimuli. The traditional way of studying this problem in the past was to manipulate the signal-to-noise ratio (SNR) for the environment in which the speech stimuli is presented. Another way involved reducing the amount of information that is normally available in the auditory speech waveform. One such technique is to use sinewave speech instead of natural speech (Remez, Rubin, Berns, Pardo, & Lang, 1994; Remez, Rubin, Pisoni, & Carrell, 1981). In sinewave speech, time-varying sinusoidal waveforms are generated by a digital synthesizer to match the LPC-derived center frequencies and amplitudes of the formants in the natural utterance. The synthetic sinewave pattern preserves the dynamics of frequency and amplitude variations observed in natural speech over time, but differs from natural speech in several important ways. There are no SINEWAVE SPEECH PERCEPTION IN A CI USER 203 harmonics, broadband formant structures, formant frequency transitions, steady-state formants, or changes in fundamental frequency. In short, sinewave speech patterns contain none of the “traditional” speech cues that are assumed to form the basis of speech perception – e.g., formant frequency transitions that cue manner and place of articulation (see Remez et al. 1981). Despite the unnatural characteristics of sinewave speech, these sound patterns are still intelligible (Remez et al., 1981; 1994). The absence of traditional acoustic cues for phonetic perception implies that sinusoidal replicas of speech should be perceived as independently changing tones and not as an integrated, linguistic percept. However, listeners are still able to extract the phonetic and lexical properties of the utterance from the highly impoverished, skeletal representation of the natural token that is preserved in the sinewave replica. This result suggests that sufficient phonetic information is still encoded in the relational and time-varying structure that is represented in the sinewave pattern, even though the synthetic waveform is obviously not producible by a vocal tract. Sinewave speech perception also shows the multimodal facilitation observed for natural speech (Remez, Fellowes, Pisoni, Goh, & Rubin, 1998). A considerable increase in identification performance was found when the sinewave patterns are presented in an audio-visual context compared to an audio-only context. Previous studies on sinewave speech perception have so far used only participants who have normal hearing at the time of testing. Since audio-visual speech perception may be even more critical for people who have hearing impairment, it is important to begin investigations into how members of this clinical population perceive sinewave speech. In particular, how would hearing-impaired individuals fitted with a cochlear implant (CI) fare in listening to sinewave speech under different presentation conditions? We are especially interested in patients who perform very well with their CI and who demonstrate the ability to use visual information to mitigate their hearing impairment. Would such users be able to integrate visual information with very unnatural auditory patterns? In this paper, we report the performance of one patient, Mr. S, in transcribing sinewave sentences under audio-only (AO), visual-only (VO), and audio-visual (A+V) conditions and then compare his performance to a group of normal-hearing participants whose data was collected by Remez et al. (1998).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correlation between Auditory Spectral Resolution and Speech Perception in Children with Cochlear Implants

Background: Variability in speech performance is a major concern for children with cochlear implants (CIs). Spectral resolution is an important acoustic component in speech perception. Considerable variability and limitations of spectral resolution in children with CIs may lead to individual differences in speech performance. The aim of this study was to assess the correlation between auditory ...

متن کامل

Learning to recognize talkers from natural, sinewave, and reversed speech samples.

In 5 experiments, the authors investigated how listeners learn to recognize unfamiliar talkers and how experience with specific utterances generalizes to novel instances. Listeners were trained over several days to identify 10 talkers from natural, sinewave, or reversed speech sentences. The sinewave signals preserved phonetic and some suprasegmental properties while eliminating natural vocal q...

متن کامل

Discrimination of synthetic full-formant and sinewave/ra-la/continua by budgerigars (Melopsittacus undulatus) and zebra finches (Taeniopygia guttata).

Discrimination of three synthetic versions of a/ra-la/ speech continuum was studied in two species of birds. The stimuli used in these experiments were identical to those used in a previous study of speech perception by humans [Best et al., Percept. Psychophys. 45, 237-250 (1989)]. Budgerigars and zebra finches were trained using operant conditioning and tested on three different series of acou...

متن کامل

Information for coarticulation: Static signal properties or formant dynamics?

Perception of a speech segment changes depending on properties of surrounding segments in a phenomenon called compensation for coarticulation (Mann, 1980). The nature of information that drives these perceptual changes is a matter of debate. One account attributes perceptual shifts to low-level auditory system contrast effects based on static portions of the signal (e.g., third formant [F3] cen...

متن کامل

1 2 3 4 5 6 7 8 Perception of sinewave vowels 9 10 11

46 There is a significant body of research examining the intelligibility of sinusoidal replicas of 47 natural speech. Discussion has followed about what the sinewave speech phenomenon might 48 imply about the mechanisms underlying phonetic recognition. However, most of this work has 49 been conducted using sentence material, making it unclear what the contributions are of 50 listeners’ use of l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000