The modulation spectrogram: in pursuit of an invariant representation of speech
نویسندگان
چکیده
Understanding the human ability to reliably process and decode speech across a wide range of acoustic conditions and speaker characteristics is a fundamental challenge for current theories of speech perception. Conventional speech representations such as the sound spectrogram emphasize many spectro-temporal details that are not directly germane to the linguistic information encoded in the speech signal and which consequently do not display the perceptual stability characteristic of human listeners. We propose a new representational format, the modulation spectrogram, that discards much of the spectro-temporal detail in the speech signal and instead focuses on the underlying, stable structure incorporated in the low-frequency portion of the modulation spectrum distributed across critical-band-like channels. We describe the representation and illustrate its stability with color-mapped displays and with results from automatic speech recognition experiments.
منابع مشابه
Cipher text only attack on speech time scrambling systems using correction of audio spectrogram
Recently permutation multimedia ciphers were broken in a chosen-plaintext scenario. That attack models a very resourceful adversary which may not always be the case. To show insecurity of these ciphers, we present a cipher-text only attack on speech permutation ciphers. We show inherent redundancies of speech can pave the path for a successful cipher-text only attack. To that end, regularities ...
متن کاملSpeech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR
The performance of an automatic speech recognition (ASR) system degrades severely in noisy and reverberant environments in part due to the lack of robustness in the underlying representations used in the ASR system. On the other hand, the auditory processing studies have shown the importance of modulation filtered spectrogram representations in robust human speech recognition. Inspired by these...
متن کاملLearning Binaural Spectrogram Features for Azimuthal Speaker Localization
Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker’s position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to ...
متن کاملLearning binaural spectrogram features for azimuthal speaker localization
Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker’s position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to ...
متن کاملRobust speech recognition using the modulation spectrogram
The performance of present-day automatic speech recognition (ASR) systems is seriously compromised by levels of acoustic interference (such as additive noise and room reverberation) representative of real-world speaking conditions. Studies on the perception of speech by human listeners suggest that recognizer robustness might be improved by focusing on temporal structure in the speech signal th...
متن کامل