Optimizing Speech Recognition Evaluation Using Stratified Sampling

نویسندگان

  • Janne Pylkkönen
  • Thomas Drugman
  • Max Bisani
چکیده

Producing large enough quantities of high-quality transcriptions for accurate and reliable evaluation of an automatic speech recognition (ASR) system can be costly. It is therefore desirable to minimize the manual transcription work for producing metrics with an agreed precision. In this paper we demonstrate how to improve ASR evaluation precision using stratified sampling. We show that by altering the sampling, the deviations observed in the error metrics can be reduced by up to 30% compared to random sampling, or alternatively, the same precision can be obtained on about 30% smaller datasets. We compare different variants for conducting stratified sampling, including a novel sample allocation scheme tailored for word error rate. Experimental evidence is provided to assess the effect of different sampling schemes to evaluation precision.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Optimizing Expected Word Error Rate via Sampling for Speech Recognition

State-level minimum Bayes risk (sMBR) training has become the de facto standard for sequence-level training of speech recognition acoustic models. It has an elegant formulation using the expectation semiring, and gives large improvements in word error rate (WER) over models trained solely using crossentropy (CE) or connectionist temporal classification (CTC). sMBR training optimizes the expecte...

متن کامل

Improved spoken language translation using n-best speech recognition hypotheses

We intended to demonstrate the effect of using N -best speech recognition hypotheses for improving speech translation performance. A log-linear model, which integrated features from speech recognition and statistical machine translation, was used to rescore the translation candidates. Model parameters were estimated by optimizing an objectively measurable but subjectively relevant translation q...

متن کامل

Optimizing Wavelet Parameters for Dereverberation in Automatic Speech Recognition

We present an optimization method of the wavelet parameters for dereverberation in automatic speech recognition (ASR). By tuning the wavelet parameters to improve the acoustic model likelihood, wavelet-based dereverberation methods become more effective in the ASR application. We evaluate several existing wavelet-based methods and optimize them, based on our proposed scheme. Experimental evalua...

متن کامل

Optimization of Cost Function Weights for Unit Selection Speech Synthesis Using Speech Recognition

A well known problem in unit selection speech synthesis is designing the join and target function sub-costs and optimizing their corresponding weights so that they reflect the human listeners’ preferences. To achieve this we propose a procedure where an objective criterion for optimal speech unit selection is used. The objective criterion for tuning the cost function weights is based on automat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016