Advanced acoustic modelling techniques in MP3 speech recognition

نویسندگان

  • Michal Borsky
  • Petr Pollák
  • Petr Mizera
چکیده

The automatic recognition of MP3 compressed speech presents a challenge to the current systems due to the lossy nature of compression which causes irreversible degradation of the speech wave. This article evaluates the performance of a recognition system optimized for MP3 compressed speech with current state-of-the-art acoustic modelling techniques and one specific front-end compensation method. The article concentrates on acoustic model adaptation, discriminative training, and additional dithering as prominent means of compensating for the described distortion in the task of phoneme and large vocabulary continuous speech recognition (LVCSR). The experiments presented on the phoneme task show a dramatic increase of the recognition error for unvoiced speech units as a direct result of compression. The application of acoustic model adaptation has proved to yield the highest relative contribution while the gain of discriminative training diminished with decreasing bit-rate. The application of additional dithering yielded a consistent improvement only for the MFCC features, but the overall results were still worse than those for the PLP features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation of Acoustic Modelling Techniques for Lvcsr Systems

The CUHTK evaluation systems typically make use of a multiple pass, multiple branch, framework. This allows a range of acoustic models to be used in the framework and the output from all the systems, or branch, to be combined to give the final output. This paper describes experiments with several advanced acoustic modelling techniques that were candidate approaches for the 2004 CU-HTK large voc...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Pronunciation Modelling for Conversational Speech Recognition: a Status Report from Ws97

Accurately modelling pronunciation variability in conversational speech is an important component for automatic speech recognition. We describe some of the projects undertaken in this direction at WS97, the Fifth LVCSR Summer Workshop, held at Johns Hopkins University, Baltimore, in July-August, 1997. We first illustrate a use of hand-labelled phonetic transcriptions of a portion of the Switchb...

متن کامل

Combined acoustic and pronunciation modelling for non-native speech recognition

In this paper, we present several adaptation methods for nonnative speech recognition. We have tested pronunciation modelling, MLLR and MAP non-native pronunciation adaptation and HMM models retraining on the HIWIRE foreign accented English speech database. The “phonetic confusion” scheme we have developed consists in associating to each spoken phone several sequences of confused phones. In our...

متن کامل

Interfacing Speech Recognition and Vision Guided

One goal of a pervasive computing environment is to allow the user to interact with the environment in an easy and natural manner. The use of spoken commands, as inputs to a speech recognition system, is one such way to naturally interact with the environment. In challenging acoustic environments, microphone arrays can improve the quality of the input audio signal by beamforming, or steering, t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • EURASIP J. Audio, Speech and Music Processing

دوره 2015  شماره 

صفحات  -

تاریخ انتشار 2015