Robust Speech Activity Detection in Interactive Smart-Room Environments

نویسندگان

  • Dusan Macho
  • Climent Nadeu
  • Andrey Temko
چکیده

In perceptive interface technologies used in smart-room environments, the determination of speech activity is one of key objectives. Due to the presence of environmental noises and reverberation, a robust Speech Activity Detection (SAD) system is required. In a previous work, a SAD system, which used Linear Discriminant Analysis-extracted features and a Decision Tree classifier, was successfully contrasted with other previously reported techniques in a set of room environment tests with the SPEECON database. In this work, the same SAD system has been tested in even more realistic conditions involving meetings, and it has been modified to significantly improve its performance. Actually, we have trained the SAD system with a subset of SPEECON data, and without any further tuning we have used it to carry out tests with the meeting databases from the NIST RT05 evaluation. In order to improve the SAD performance, we consider two additional features which are measures of energy dynamics at low and high frequencies, respectively. Besides that, two alternative classifiers have been tested, which are based on Support Vector Machines and Gaussian Mixture Models, respectively. With the latter classifier, and using both the LDA features and the low-frequency energy dynamics feature, a large improvement in speech detection performance has been observed, e.g. the NIST error rate was reduced from 20.69% to 8.47% for the RT05 evaluation data. In addition, we report the results obtained with a slightly modified version of the SAD system in the NIST RT06 evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Speech Recognition and Speech Activity Detection in the CHIL Smart Room

An important step to bring speech technologies into wide deployment as a functional component in man-machine interfaces is to free the users from close-talk or desktop microphones, and enable far-field operation in various natural communication environments. In this work, we consider far-field automatic speech recognition and speech activity detection in conference rooms. The experiments are co...

متن کامل

Systems for Robust Speech Activity Detection and Their Results with the Rt05 and Rt06 Evaluation Tests

Robust Speech Activity Detection (SAD) systems are required in smart-room environments due to the presence of noises and reverberation. In this work, a previous SAD system, based on LDA-extracted features and a decision tree classifier, has been modified in terms of both feature extraction and classification to significantly improve its performance. New features based on the lowand highfrequenc...

متن کامل

Robust Speech Recognition with Microphone Arrays in Multi-room Home Environments

This paper presents a set of exploratory experiments addressed to analyse and evaluate the performance of baseline speech processing components for distant voice command recognition applications in domestic environments. The analysis, conducted in a multi-channel multi-room scenario, showed the importance of adequate room detection and channel selection strategies to obtain acceptable performan...

متن کامل

Smart headphones: enhancing auditory awareness through robust speech detection and source localization

We describe a method for enhancing auditory awareness by selectively passing speech sounds in the environment to the user. We develop a robust far-field speech detection algorithm for noisy environments and a source localization algorithm for flexible arrays. We then combine these methods to give a user control over the spatial regions from which speech will be passed through. Using this techni...

متن کامل

Reverberation-Robust Online Multi-Speaker Tracking by Using a Microphone Array and CASA Processing

Online tracking of speakers is an important task for applications in smart environments such as camera control, meeting annotation and speech separation. Challenges for an audio-only system are small-room reverberation, noise, the unknown number of speakers, and gaps occurring in natural speech. Combining models from neurobiology and cognitive psychology with many-channel signal processing and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006