Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings
نویسندگان
چکیده
Out-of-vocabulary (OOV) words are the most challenging problem in automatic speech recognition (ASR), especially for morphologically rich languages. Most end-to-end systems performed at word and character levels of a language. Amharic is poorly resourced but This paper proposes hybrid connectionist temporal classification with attention architecture syllabification algorithm system (AASR) using its phoneme-based subword units. helps to insert epithetic vowel እ[ɨ], which not included our Grapheme-to-Phoneme (G2P) conversion developed consonant–vowel (CV) representations graphemes. The proposed model was trained various subwords, namely characters, phonemes, character-based subwords generated by byte-pair-encoding (BPE) segmentation algorithm. Experimental results showed that context-dependent tend result more accurate than character-based, phoneme-based, counterparts. Further improvement also obtained SpecAugment data augmentation technique. error rate (WER) reduction 18.38% compared acoustic modeling word-based recurrent neural network language (RNNLM) baseline. These models useful improve machine translation tasks.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملAdvancing Connectionist Temporal Classification With Attention Modeling
In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network r...
متن کاملMultiple pronunciation model for Amharic speech recognition system
In this paper the research have tried to show the pattern variations of sound units in Amharic language for multiple pronunciation model. This are variation of sound units at lexical level due to dialects. After that an attempt to build a pronunciation dictionary for Automatic Speech Recognition (ASR).At last comments and recommendations are included. Amharic is an official language of Ethiopia...
متن کاملGrapheme-to-Phoneme Conversion for Amharic Text-to-Speech System
Developing correct Grapheme-to-Phoneme (GTP) conversion method is a central problem in text-tospeech synthesis. Particularly, deriving phonological features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation but neither is shown in orthography. This paper describes an architecture, a preprocessing...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information
سال: 2021
ISSN: ['2078-2489']
DOI: https://doi.org/10.3390/info12020062