Prosodic word-based error correction in speech recognition using prosodic word expansion and contextual information
نویسندگان
چکیده
In this study, considering the effect of phrase grouping in spontaneous speech, prosodic words, instead of lexical words, are adopted as the units for error correction of speech recognition results. The prosodic words and the corresponding mis-recognized word fragments are obtained from a speech database to construct a mis-recognized word fragment table for the extracted prosodic words. For each word fragment in a recognized word sequence, the potential prosodic words which are likely to be misrecognized as input word fragments are retrieved from the table for prosodic word candidate expansion. The prosodic word-based contextual information, considering substitution and concatenation scores, is then employed into a probabilistic model to find the best word fragment sequence as the corrected output. Experimental results show that the proposed method achieved a 0.32 F1 score, with improvements of 0.18 and 0.10 compared to the SMT-based and lexical word-based approaches, respectively.
منابع مشابه
Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation
We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hiddenMarkov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We e...
متن کاملProsody Aware Word-Level Encoder Based on BLSTM-RNNs for DNN-Based Speech Synthesis
Recent studies have shown the effectiveness of the use of word vectors in DNN-based speech synthesis. However, these word vectors trained from a large amount of text generally carry not prosodic information, which is important information for speech synthesis, but semantic information. Therefore, if word vectors that take prosodic information into account can be obtained, it would be expected t...
متن کاملUsing prosodic information for disambiguation purposes
In this work, we describe how prosodic information can be employed to improve the performance of an Automatic Speech Recognizer (ASR) for specific restricted tasks. The approach exploits additional prosodic information in a post-processing stage. Prosodic features are estimated at word level; this additional information is encoded through a feature extractor and is then modeled using a statisti...
متن کاملطراحی و ارزیابی یک مدل بازسازی گفتار به روش همگذاری واحدهای حساس به بافت نوایی
This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Persian text-to-speech (TTS) synthesis system. Thesyllables used are prosodically conditioned in the sense that a single conventional syllable is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The three levels of the Per...
متن کاملSimultaneous recognition of words and prosody in the Boston University Radio Speech Corpus q
This paper describes automatic speech recognition systems that satisfy two technological objectives. First, we seek to improve the automatic labeling of prosody, in order to aid future research in automatic speech understanding. Second, we seek to apply statistical speech recognition models of prosody for the purpose of reducing the word error rate of an automatic speech recognizer. The systems...
متن کامل