Improving speech systems built from very little data
نویسندگان
چکیده
This paper studies two ways for helping non-specialist users develop speech systems from limited data for new languages. Focused web re-crawling finds additional examples of text matching the domain as specified by the user. This improves the language model and cuts word error rate nearly in half. Iterative voice building with interleaved lexicon construction uses the voice from a previous iteration to help construct an improved voice. 4.5 hours of the user’s time reduces transcription error rate from 32% to 4%.
منابع مشابه
Improving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملEnglish-Spanish Bilingual Alphabet for Embedded Speech Recognition
This article introduces the phonetic alphabet that has been used to train acoustic models with a mixture of Spanish language and American English data, with the purpose of improving the speech recognition performance, when using Spanish, for speakers that are fluent in both languages, as is very frequently the case in the USA Spanish speaking population. We target a decoder that can be used in ...
متن کاملEffects of Speech Recognition Accuracy on the Performance of DARPA Communicator Spoken Dialogue Systems
The DARPA Communicator program explored ways to construct better spoken-dialogue systems, with which users interact via speech alone to perform relatively complex tasks such as travel planning. During 2000 and 2001 two large data sets were collected from sessions in which paid users did travel planning using the Communicator systems that had been built by eight research groups. The research gro...
متن کاملMel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise
We propose a method that modifies the Mel cepstral coefficients of HMM-generated synthetic speech in order to increase the intelligibility of the generated speech when heard by a listener in the presence of a known noise. This method is based on an approximation we previously proposed for the Glimpse Proportion measure. Here we show how to update the Mel cepstral coefficients using this measure...
متن کاملImproving TTS with Corpus-Specific Pronunciation Adaptation
Text-to-speech (TTS) systems are built on speech corpora which are labeled with carefully checked and segmented phonemes. However, phoneme sequences generated by automatic grapheme-to-phoneme converters during synthesis are usually inconsistent with those from the corpus, thus leading to poor quality synthetic speech signals. To solve this problem, the present work aims at adapting automaticall...
متن کامل