reporting error

Duration normalization and hypothesis combination for improved spontaneous speech recognition

2003

Jon P. Nedel Richard M. Stern

When phone segmentations are known a priori, normalizing the duration of each phone has been shown to be effective in overcoming weaknesses in duration modeling of Hidden Markov Models (HMMs). While we have observed potential relative reductions in word error rate (WER) of up to 34.6% with oracle segmentation information, it has been difficult to achieve significant improvement in WER with segm...

متن کامل

Multi-Reference Evaluation for Dialectal Speech Recognition System: A Study for Egyptian ASR

2015

Ahmed M. Ali Walid Magdy Steve Renals

Dialectal Arabic has no standard orthographic representation. This creates a challenge when evaluating an Automatic Speech Recognition (ASR) system for dialect. Since the reference transcription text can vary widely from one user to another, we propose an innovative approach for evaluating dialectal speech recognition using Multi-References. For each recognized speech segments, we ask five diff...

متن کامل

Recent improvements in voicemail transcription

1999

Mukund Padmanabhan George Saon Sankar Basu Jing Huang Geoffrey Zweig

In this paper we report recent improvements in voicemail transcription. The voicemail transcription task was introduced last year [1] as representing a style of conversational telephone speech that is somewhat different from the Switchboard and CallHome [2] databases. Last year, the speaker independent and speaker adapted word error rates (WER) on this task were reported at 41.94% and 38.18% re...

متن کامل

Automatic Error Analysis for Morphologically Rich Languages

2011

Ahmed El Kholy Nizar Habash

This paper presents AMEANA, an opensource tool for error analysis for natural language processing tasks targeting morphologically rich languages. Unlike standard evaluation metrics such as BLEU or WER, AMEANA automatically provides a detailed error analysis that can help researchers and developers better understand the strengths and weaknesses of their systems. AMEANA is easily adaptable to any...

متن کامل

Improving the ensemble speaker and speaking environment modeling approach by enhancing the precision of the online estimation process

2008

Yu Tsao Chin-Hui Lee

In this paper, we study methods to enhance the precision of the online estimation process of a recently proposed approach, ensemble speaker and speaking environment modeling (ESSEM), and therefore improve its overall performance. The ESSEM approach consists of two integral phases, offline and online. In the offline phase, an ensemble environment configuration is prepared by a large collection o...

متن کامل

Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch

1998

Don McAllaster Larry Gillick Francesco Scattone Michael Newman

We present a study of data simulated using acoustic models trained on Switchboard data, and then recognized using various Switchboard-trained acoustic models. When we recognize real Switchboard conversations, simple development models give a word error rate (WER) of about 47 percent. If instead we simulate the speech data using word transcriptions of the conversation, obtaining the pronunciatio...

متن کامل

An Information Theoretic Measure of Sequence Recognition Performance Idiap-com 02-03 an Information Theoretic Measure of Sequence Recognition Performance

1998

Andrew C. Morris

Sequence recognition performance is often summarised first in terms of the number of hits (H), substitutions (S), deletions (D) and insertions (I), and then as a single statistic by the “word error rate” WER = 100(S+D+I)/(H+S+D). While in common use, WER has two disadvantages as a performance measure. One is that it has no upper bound, so it doesn’t tell you how good a system is, only that one ...

متن کامل

Combination of words and word categories in varigram histories

1999

Reinhard Blasig

This paper presents a new kmd of language models: caregor@vord varigrums. This special model type permits a tight integration of word-based and category-based modeling of word sequences. Any succession of words and word categones may be employed to descnbe a given word history. This provides a much greater flexibtlity than previous combinations of word-based and category-based language models. ...

متن کامل

Analysis of the Characteristics of Talk-show TV Programs

2012

Fabio Brugnara Daniele Falavigna Diego Giuliani Roberto Gretter

We examined the content of 2 talk-show TV programs in order to better understand the challenges posed by this program genre to automatic transcription. Six talk-show episodes were first segmented, transcribed and annotated by experts. Most of the speech content was found in conversational style with a significant portion of overlapped speech, about 18%. Then, automatic speech recognition experi...

متن کامل

Effects of word error rate in the DARPA communicator data during 2000 and 2001

2002

Gregory A. Sanders Audrey N. Le John S. Garofolo

During 2000 and 2001 two large data collections were performed, with paid users. We analyze the effects of speech recognition accuracy, as measured by Word Error Rate (WER), on other metrics. Analysis shows a linear correlation between WER and the Task Completion metrics, and (unexpectedly) this relationship remains more or less linear even for quite high values of WER. The picture for User Sat...

متن کامل