Human beatbox sound recognition using an automatic speech recognition toolkit
نویسندگان
چکیده
Human beatboxing is a vocal art making use of speech organs to produce drum sounds and imitate musical instruments. Beatbox sound classification current challenge that can be used for automatic database annotation music-information retrieval. In this study, large-vocabulary human-beatbox recognition system was developed with an adaptation Kaldi toolbox, widely-used tool recognition. The corpus consisted eighty boxemes, which were recorded repeatedly by two beatboxers. annotated transcribed the means beatbox specific morphographic writing (Vocal Grammatics). recognition-system robustness recording conditions assessed on recordings six different microphones settings. decoding part made monophone acoustic models trained classical HMM-GMM model. A change features (MFCC, PLP, Fbank) variation parameters tested: (i) number HMM states, (ii) MFCC, (iii) presence or not pause boxeme in right left contexts lexicon (iv) rate silence probability. Our best model obtained addition each lexicon, 0.8 probability, 22 MFCC three states HMM. Boxeme error such configuration lowered 13.65%, 8.6 boxemes over 10 well recognized. settings did greatly affect performance, apart from closed-cup technique.
منابع مشابه
Modelling human speech recognition using automatic speech recognition paradigms in speM
We have recently developed a new model of human speech recognition, based on automatic speech recognition techniques [1]. The present paper has two goals. First, we show that the new model performs well in the recognition of lexically ambiguous input. These demonstrations suggest that the model is able to operate in the same optimal way as human listeners. Second, we discuss how to relate the b...
متن کاملDiscrete Speech Recognition Using a Hausdorff Based Metric - An Automatic Word-Based Speech Recognition Approach
In this work we provide an automatic speaker-independent word-based discrete speech recognition approach. Our proposed method consist of several processing levels. First, an word-based audio segmentation is performed, then a feature extraction is applied on the obtained segments. The speech feature vectors are computed using a delta delta mel cepstral vocal sound analysis. Then, a minimum dista...
متن کاملComparing Human and Automatic Speech Recognition Using Word Gating
This paper describes a study in which we compare human and automatic recognition of words in fluent and disfluent spontaneous speech. In a word-level gating study with confidence judgements, we examine how the recognition and confidence of recognition of words by humans develops over utterances and show how disfluency disrupts the process. We give an automatic recogniser the same task and compa...
متن کاملLarge Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit
This paper describes audio-visual speech recognition experiments on a multi-speaker, large vocabulary corpus using the Janus speech recognition toolkit. We describe a complete audio-visual speech recognition system and present experiments on this corpus. By using visual cues as additional input to the speech recognizer, we observed good improvements, both on clean and noisy speech in our experi...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Biomedical Signal Processing and Control
سال: 2021
ISSN: ['1746-8094', '1746-8108']
DOI: https://doi.org/10.1016/j.bspc.2021.102468