Word Prominence Detection using Robust yet Simple Prosodic Features
نویسندگان
چکیده
Automatic detection of word prominence can provide valuable information for downstream applications such as spoken language understanding. Prior work on automatic word prominence detection exploit a variety of lexical, syntactic, and prosodic features and model the task as a sequence labeling problem (independently or using context). While lexical and syntactic features are highly correlated with the notion of word prominence, the output of speech recognition is typically noisy and hence these features are less reliable than the acousticprosodic feature stream. In this work, we address the automatic detection of word prominence through novel prosodic features that capture the changes in F0 curve shape and magnitude in conjunction with duration and energy. We contrast the utility of these features with aggregate statistics of F0, duration and energy used in prior work. Our features are simple to compute yet robust to the inherent difficulties associated with identifying salient points (such as F0 peaks) in the F0 contour. Feature analysis demonstrates that these novel features are significantly more predictive than the standard aggregation-based prosodic features. Experimental results on a corpus of spontaneous speech indicate that prominence detection accuracy using only the new prosodic features is better than using both lexical and syntactic features.
منابع مشابه
The Impact of Word Alignment Accuracy on Audio-visual Word Prominence Detection
To automatically detect prominent syllables or words, most approaches require a segmentation of the speech signal and a subsequent extraction of prosodic features in these segments. In this paper we investigate the impact of the precision of this segmentation on the detection. We perform the segmentation of our audiovisual prosodically rich corpus based on an HMM trained on a large dataset. The...
متن کاملAutomatic detection of sentence prominence in speech using predictability of word-level acoustic features
Automatic detection of prominence in speech is an important task for many spoken language applications. However, most previous approaches rely on the availability of a corpus that is annotated with prosodic labels in order to train classifiers, therefore lacking generality beyond high-resourced languages. In this paper, we propose an algorithm for the automatic detection of sentence prominence ...
متن کاملWord segmentation in Persian continuous speech using F0 contour
Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...
متن کاملIdentifying prosodic prominence patterns for English text-to-speech synthesis
This thesis proposes to improve and enrich the expressiveness of English Textto-Speech (TTS) synthesis by identifying and generating natural patterns of prosodic prominence. In most state-of-the-art TTS systems the prediction from text of prosodic prominence relations between words in an utterance relies on features that very loosely account for the combined effects of syntax, semantics, word i...
متن کاملNoise Robust Speech Recognition Using Prosodic Information
This paper proposes a noise robust speech recognition method for Japanese utterances using prosodic information. In Japanese, the fundamental frequency (F0) contour conveys phrase intonation and word accent information. Consequently, it also conveys information about prosodic phrase and word boundaries. This paper first proposes a noise robust F0 extraction method using the Hough transform, whi...
متن کامل