Predicting Human Perceived Accuracy of ASR Systems

نویسندگان

Taniya Mishra

Andrej Ljolje

Mazin Gilbert

چکیده

Word error rate (WER), which is the most commonly used method of measuring automatic speech recognition (ASR) accuracy, penalizes all types of ASR errors equally. However, humans differentially weigh different types of ASR errors. They judge ASR errors that distort the meaning of the spoken message more harshly than those that do not. Aiming to align more closely with human perception of ASR accuracy, we developed a new metric HPA (Human Perceived Accuracy) that predicts the subjective perceived accuracy of ASR transcriptions. HPA is computed based on the central idea of differential weighting of different ASR errors. Applied to the particular task of automatically recognizing voicemails, we found that the correlation between HPA and the human judgement of ASR accuracy was significantly higher (r-value=0.91) than the correlation between WER and human judgement (r-value=0.65).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Barge-in Utterance Errors by using Implicitly-Supervised ASR Accuracy and Barge-in Rate per User

Modeling of individual users is a promising way of improving the performance of spoken dialogue systems deployed for the general public and utilized repeatedly. We define “implicitly-supervised” ASR accuracy per user on the basis of responses following the system’s explicit confirmations. We combine the estimated ASR accuracy with the user’s barge-in rate, which represents how well the user is ...

متن کامل

Concept Form Adaptation in Human-Computer Dialog

In this work we examine user adaptation to a dialog system’s choice of realization of task-related concepts. We analyze forms of the time concept in the Let’s Go! spoken dialog system. We find that users adapt to the system’s choice of time form. We also find that user adaptation is affected by perceived system adaptation. This means that dialog systems can guide users’ word choice and can adap...

متن کامل

Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems

We exploit the barge-in rate of individual users to predict automatic speech recognition (ASR) errors. A barge-in is a situation in which a user starts speaking during a system prompt, and it can be detected even when ASR results are not reliable. Such features not using ASR results can be a clue for managing a situation in which user utterances cannot be successfully recognized. Since individu...

متن کامل

Factors that influence the performance of experienced speech recognition users.

Performance on automatic speech recognition (ASR) systems for users with physical disabilities varies widely between individuals. The goal of this study was to discover some key factors that account for that variation. Using data from 23 experienced ASR users with physical disabilities, the effect of 20 different independent variables on recognition accuracy and text entry rate with ASR was mea...

متن کامل

DNN Online with iVectors Acoustic Modeling and Doc2Vec Distributed Representations for Improving Automated Speech Scoring

When applying automated speech-scoring technology to the rating of globally administered real assessments, there are several practical challenges: (a) ASR accuracy on non-native spontaneous speech is generally low; (b) due to the data mismatch between an ASR systems training stage and its final usage, the recognition accuracy obtained in practice is even lower; (c) content-relevance was not wid...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Predicting Human Perceived Accuracy of ASR Systems

نویسندگان

چکیده

منابع مشابه

Predicting Barge-in Utterance Errors by using Implicitly-Supervised ASR Accuracy and Barge-in Rate per User

Concept Form Adaptation in Human-Computer Dialog

Predicting ASR errors by exploiting barge-in rate of individual users for spoken dialogue systems

Factors that influence the performance of experienced speech recognition users.

DNN Online with iVectors Acoustic Modeling and Doc2Vec Distributed Representations for Improving Automated Speech Scoring

عنوان ژورنال:

اشتراک گذاری