Unified decoding and feature representation for improved speech recognition
نویسندگان
چکیده
In this paper we propose a unified framework for decoding and feature representation based on the Maximum A Posterior (MAP) principle. The search space is augmented with an additional feature stream dimension such that different feature representations can be utilized for different phonetic context under the HMM decoding framework. We also provide a theoretic explanation for the unified framework. It gives us “supervised” signal processing and feature extraction for the recognition system, which has reduced the word recognition error rate by 15% on a large-vocabulary continuous speech recognition task when multiple feature streams are used simultaneously.
منابع مشابه
روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملLocal gradient pattern - A novel feature representation for facial expression recognition
Many researchers adopt Local Binary Pattern for pattern analysis. However, the long histogram created by Local Binary Pattern is not suitable for large-scale facial database. This paper presents a simple facial pattern descriptor for facial expression recognition. Local pattern is computed based on local gradient flow from one side to another side through the center pixel in a 3x3 pixels region...
متن کاملIncluding uncertainty of speech observations in robust speech recognition
Noise compensation methods for speech recognition provide a cleaned version of the speech representation. Usually this cleaned version is the expected value of the speech parameters given the observed noisy speech and the noise statistic. A more realistic representation should include the probability distribution of the cleaned speech instead of its expected value in order to represent the unce...
متن کاملImproving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کامل