Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings

نویسندگان

چکیده

Out-of-vocabulary (OOV) words are the most challenging problem in automatic speech recognition (ASR), especially for morphologically rich languages. Most end-to-end systems performed at word and character levels of a language. Amharic is poorly resourced but This paper proposes hybrid connectionist temporal classification with attention architecture syllabification algorithm system (AASR) using its phoneme-based subword units. helps to insert epithetic vowel እ[ɨ], which not included our Grapheme-to-Phoneme (G2P) conversion developed consonant–vowel (CV) representations graphemes. The proposed model was trained various subwords, namely characters, phonemes, character-based subwords generated by byte-pair-encoding (BPE) segmentation algorithm. Experimental results showed that context-dependent tend result more accurate than character-based, phoneme-based, counterparts. Further improvement also obtained SpecAugment data augmentation technique. error rate (WER) reduction 18.38% compared acoustic modeling word-based recurrent neural network language (RNNLM) baseline. These models useful improve machine translation tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

Advancing Connectionist Temporal Classification With Attention Modeling

In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network r...

متن کامل

Multiple pronunciation model for Amharic speech recognition system

In this paper the research have tried to show the pattern variations of sound units in Amharic language for multiple pronunciation model. This are variation of sound units at lexical level due to dialects. After that an attempt to build a pronunciation dictionary for Automatic Speech Recognition (ASR).At last comments and recommendations are included. Amharic is an official language of Ethiopia...

متن کامل

Grapheme-to-Phoneme Conversion for Amharic Text-to-Speech System

Developing correct Grapheme-to-Phoneme (GTP) conversion method is a central problem in text-tospeech synthesis. Particularly, deriving phonological features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation but neither is shown in orthography. This paper describes an architecture, a preprocessing...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information

سال: 2021

ISSN: ['2078-2489']

DOI: https://doi.org/10.3390/info12020062