Speech recognition based on acoustically derived segment units
نویسندگان
چکیده
This paper describes a new method of word model generation based on acoustically derived segment units (henceforth ASUs). An ASU-based approach has the advantages of growing out of human pre-determined phonemes and of consistently generating acoustic units by using the maximum likelihood (ML) criterion. The former advantage is e ective when it is di cult to map acoustics to a phone such as with highly co-articulated spontaneous speech. In order to implement an ASU-based modeling approach in a speech recognition system, we must rst solve two points: (1) How do we design an inventory of acoustically-derived segmental units and (2) How do we model the pronunciations of lexical entries in terms of the ASUs. As for the second question, we propose an ASU-based word model generation method by composing the ASU statistics, that is, their means, variances and durations. The e ectiveness of the proposed method is shown through spontaneous word recognition experiments.
منابع مشابه
Design of a speech recognition system based on acoustically derived segmental units
The design of speech recognition system based on acoustically-derived, segmental units can be divided in three steps: unit design, lexicon building and pronunciation modeling. We formulate an iterative unit design procedure which consistently uses a maximum likelihood (ML) objective in successive application of resegmentation and model re-estimation. The lexicon building allows multi-word entri...
متن کاملA comparison of broad phonetic and acoustic units for noise robust segment-based phonetic recognition
In this paper, we compare speech recognition performance using broad phoneticallyand acoustically-motivated units as a pre-processor in designing a novel noise robust landmark detection and segmentation algorithm. We introduce a cluster evaluation method to measure acoustic unit cluster quality. On the noisy TIMIT task, we find that the acoustic and phonetic segmentation approaches offer signif...
متن کاملUnsupervised Learning of Non-Uniform Segmental Units for Acoustic Modeling in Speech Recognition
Great progress has been made in the development of recognition systems for continuous read speech but the performance of these systems degrades severely when they are applied to spontaneous speech. This indicates that a different approach in modeling is required to design a system that is better suited to spontaneous speech. Our approach is to combine two advances proposed in previous work: the...
متن کاملAre Initial / Final Units Acoustically Accurate ?
| We show a comparative study of subword unit segmentation of Mandarin speech data. Most HMM recognition systems use intial//nals as subword units for Mandarin speech. We nd that such a division of monosylla-ble data into intial//nal units are not always supported by acoustic evidences. We implement a delta MFCC based seg-mentation method and compare its output with that of Viterbi segmentation...
متن کاملModeling linguistic segment and turn boundaries for n-best rescoring of spontaneous speech
Language modeling, especially for spontaneous speech, often suffers from a mismatch of utterance segmentations between training and test conditions. In particular, training often uses linguistically-based segments, whereas testing occurs on acoustically determined segments, resulting in degraded performance. We present an N-best rescoring algorithm that removes the effect of segmentation mismat...
متن کامل