Energy contour extraction for in-car speech recognition

نویسنده

  • Tai-Hwei Hwang
چکیده

The time derivatives of speech energy, such as the delta and the delta-delta log energy, have been known as critical features for automatic speech recognition (ASR). However, their discriminative ability in lower signal-to-noise ratio (SNR) could be limited or even becomes harmful because of the corruption of energy contour. By taking the advantage of the spectral characteristic of in-car noise, the speech energy contour is extracted from the high-pass filtered signal so as to reduce the distortion in the delta energy. Such filtering can be implemented by using a pre-emphasis-like filter or a summation of higher frequency band energies. A Chinese name recognition task is conducted to evaluate the proposed method by using real in-car speech and artificially generated one as the test data. As shown in the experimental results, the method is capable of improving the recognition accuracy of in-car speech in lower SNR as well as of the clean speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Melody Recognition Performance of a Cochlear Implant Speech Processing Strategy Using Instantaneous Frequency Encoding Based on Teager Energy Operator

We present a speech processing strategy incorporating instantaneous frequency (IF) encoding for the enhancement of melody recognition performance of cochlear implants. For the IF extraction from incoming sound, we propose the use of a Teager energy operator (TEO), which is advantageous for its lower computational load. From time-frequency analysis, we verified that the TEO-based method provides...

متن کامل

Classification of taiwanese tones based on pitch and energy movements

This paper addresses the difficulties associated with automatically distinguishing the seven Taiwanese tones. The tone recogniser is an essential component of any automatic speech recognition system customised for tone languages such as Taiwanese. We show that it is difficult to distinguish between the Taiwanese tones simply employing the fundamental frequency contours and that the task is simp...

متن کامل

Towards High Performance Phonotactic Feature for Spoken Language Recognition

With the demands of globalization, multilingual speech is increasingly common in conversational telephone speech, broadcast news and internet podcasts. Therefore, automatic spoken language recognition has become an important technology in multilingual speech related applications. For example, automatic spoken language recognition has been used as a preprocessing component for spoken language tr...

متن کامل

The Lombard Effect in Spontaneous Dialog Speech

The Lombard effect – environmental noise affects speech production – has already been studied extensively for read lab speech. In this study spontaneous dialog speech produced by 24 German speakers has been recorded under noisy conditions and analysed for the Lombard effect. A sophisticated experimental setup using behind-the-ear hearing aid equipment allows us to insert real car noise into the...

متن کامل

A Survey – Audio and Video Synchronization

The audio and video Synchronization is extremely necessary. The synchronization loss between image and sound continues to disturb observers and irritate telecasters. The demand is to assure synchronization without adjusting content at the same time as still retaining price low. The objective of the synchronization is to line up both the audio and video signals that are processed individually. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003