SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition

نویسندگان

چکیده

Error correction in automatic speech recognition (ASR) aims to correct those incorrect words sentences generated by ASR models. Since recent models usually have low word error rate (WER), avoid affecting originally tokens, should only modify words, and therefore detecting is important for correction. Previous works on either implicitly detect through target-source attention or CTC (connectionist temporal classification) loss, explicitly locate specific deletion/substitution/insertion errors. However, implicit detection does not provide clear signal about which tokens are explicit suffers from accuracy. In this paper, we propose SoftCorrect with a soft mechanism the limitations of both detection. Specifically, first whether token probability produced dedicatedly designed language model, then design constrained loss that duplicates detected let decoder focus tokens. Compared provides thus need duplicate every but tokens; compared detection, errors just leaves it loss. Experiments AISHELL-1 Aidatatang datasets show achieves 26.1% 9.4% CER reduction respectively, outperforming previous large margin, while still enjoying fast speed parallel generation.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Error Detection in Automatic Speech Recognition

We offer a supervised machine learning approach for recognizing erroneous words in the output of a speech recognizer. We have investigated several sets of features combined with two word configurations, and compared the performance of two classifiers: Decision Trees and Naïve Bayes. Evaluation was performed on a corpus of 400 spoken referring expressions, with Decision Trees yielding a high rec...

متن کامل

Recent Improvements on Error Detection for Automatic Speech Recognition

Automatic speech recognition(ASR) offers the ability to access the semantic content present in spoken language within audio and video documents. While acoustic models based on deep neural networks have recently significantly improved the performances of ASR systems, automatic transcriptions still contain errors. Errors perturb the exploitation of these ASR outputs by introducing noise to the te...

متن کامل

Context-based Speech Recognition Error Detection and Correction

In this paper we present preliminary results of a novel unsupervised approach for highprecision detection and correction of errors in the output of automatic speech recognition systems. We model the likely contexts of all words in an ASR system vocabulary by performing a lexical co-occurrence analysis using a large corpus of output from the speech system. We then identify regions in the data th...

متن کامل

Robust Error Correction of Continuous Speech Recognition

We present a post-processing technique for correcting errors committed by an arbitrary continuous speechrecognizer. The technique leverages our observation that consistent recognition errors arising from mismatched training and usageconditions can be modeled and corrected. We have implemented a post-processor called SPEECHPP to correct word-level errors, and we show that this post-processing te...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i11.26531