Inducing decision tree pronunciation variation models from annotated speech data
نویسنده
چکیده
A model of pronunciation of words in discourse context has been induced from the annotation of a spoken language corpus. The information included in the annotation is a set of variables hypothesised to be important for the pronunciation of words in discourse context. The annotation is connected to segmentally defined units on tiers corresponding to linguistically relevant units: the discourse, the utterance, the phrase, the word, the syllable and the phoneme. The model is represented as a tree structure, making it transparent for analysis and easy to use in a speech synthesis system. Using phonemic canonical pronunciation representations to estimate the segmental string of the annotated data gives a 22.1% phone error rate. Decision tree pronunciation variation models generated in a tenfold cross validation procedure showed an average phone error rate of 9.9%. Using multiple context variables for modelling pronunciation variation could thus reduce the error rate by 55%, compared to a baseline using canonical pronunciation representations.
منابع مشابه
Modeling Pronunciation Variation for Cantonese Speech Recognition
Due to the large variability of pronunciation in spontaneous speech, pronunciation modeling becomes a more challenging and essential part in speech recognition. In this paper, we describe two different approaches of pronunciation modeling by using decision tree. At lexical level, a pronunciation variation dictionary is built to obtain alternative pronunciations for each word, in which each entr...
متن کاملAnnotating Speech Data for Pronunciation Variation Modelling
This paper describes methods for annotating recorded speech with information hypothesised to be important for the pronunciation of words in discourse context. Annotation is structured into six hierarchically ordered tiers, each tier corresponding to a segmentally defined linguistic unit. Automatic methods are used to segment and annotate the respective annotation tiers. Decision tree models tra...
متن کاملContextual Word and Syllable Pronunciation Models
This work focuses on the evaluation of models of syllable and word pronunciations constructed automatically using the Broadcast News corpus of radio and television news reports. Previous work [4] introduced the concept of extended-length decision tree models; here I report on ASR-independent assessment of these models. This study also discusses integration of static and dynamic pronunciation ev...
متن کاملModeling Pronunciation Variation in Conversational Speech using Syntax and Discourse
A significant source of variation in spontaneous speech is due to intra-speaker pronunciation changes. Previous work in automatic speech recognition has identified several factors that affect pronunciation variability such as phonetic context and speaking rate. This work examines new higher level information sources: syntax and discourse structure, specifically the relationship between these fa...
متن کاملModeling pronunciation variation using artificial neural networks for English spontaneous speech
Pronunciation variation in conversational speech has caused significant amount of word errors in large vocabulary automatic speech recognition. Rule-based approaches and decision-tree based approaches have been previously proposed to model pronunciation variation. In this paper, we report our work on modeling pronunciation variation using artificial neural networks (ANN). The results we achieve...
متن کامل