Speech-to-Text Conversion in Indonesian Language Using a Deep Bidirectional Long Short-Term Memory Algorithm
نویسندگان
چکیده
Nowadays, speech is used also for communication between humans and computers, which requires conversion from to text. Nevertheless, few studies have been performed on speech-to-text in Indonesian language, most were limited the of datasets with incomplete sentences. In this study, complete sentences language using deep bidirectional long short-term memory (LSTM) algorithm. Spectrograms Mel frequency cepstral coefficients (MFCCs) utilized as features a total 5000 data spoken by ten subjects (five males five females). The results showed that LSTM algorithm successfully converted text Indonesian. accuracy achieved MFCC was higher than spectrograms; obtained best word error rate value 0.2745% while spectrograms 2.0784%. Thus, MFCCs are more suitable feature study will help implementation tools other languages.
منابع مشابه
Modelling Radiological Language with Bidirectional Long Short-Term Memory Networks
Motivated by the need to automate medical information extraction from free-text radiological reports, we present a bi-directional long short-term memory (BiLSTM) neural network architecture for modelling radiological language. The model has been used to address two NLP tasks: medical named-entity recognition (NER) and negation detection. We investigate whether learning several types of word emb...
متن کاملSpeech dereverberation using long short-term memory
Recently, neural networks have been used for not only phone recognition but also denoising and dereverberation. However, the conventional denoising deep autoencoder (DAE) based on the feed-forward structure is not capable of handling very long speech frames of reverberation. LSTM can be effectively trained to reduce the average error between the enhanced signal and the original clean signal by ...
متن کاملDeep Bidirectional Long Short-Term Memory Recurrent Neural Networks for Grapheme-to-Phoneme Conversion Utilizing Complex Many-to-Many Alignments
Efficient grapheme-to-phoneme (G2P) conversion models are considered indispensable components to achieve the stateof-the-art performance in modern automatic speech recognition (ASR) and text-to-speech (TTS) systems. The role of these models is to provide such systems with a means to generate accurate pronunciations for unseen words. Recent work in this domain is based on recurrent neural networ...
متن کاملthe effects of keyword and context methods on pronunciation and receptive/ productive vocabulary of low-intermediate iranian efl learners: short-term and long-term memory in focus
از گذشته تا کنون، تحقیقات بسیاری صورت گرفته است که همگی به گونه ای بر مثمر ثمر بودن استفاده از استراتژی های یادگیری لغت در یک زبان بیگانه اذعان داشته اند. این تحقیق به بررسی تاثیر دو روش مختلف آموزش واژگان انگلیسی (کلیدی و بافتی) بر تلفظ و دانش لغوی فراگیران ایرانی زیر متوسط زبان انگلیسی و بر ماندگاری آن در حافظه می پردازد. به این منظور، تعداد شصت نفر از زبان آموزان ایرانی هشت تا چهارده ساله با...
15 صفحه اولImproving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks
Motivation Capturing long-range interactions between structural but not sequence neighbors of proteins is a long-standing challenging problem in bioinformatics. Recently, long short-term memory (LSTM) networks have significantly improved the accuracy of speech and image classification problems by remembering useful past information in long sequential events. Here, we have implemented deep bidir...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Advanced Computer Science and Applications
سال: 2021
ISSN: ['2158-107X', '2156-5570']
DOI: https://doi.org/10.14569/ijacsa.2021.0120327