A Supervised Factorial Acoustic Model for Simultaneous Multiparticipant Vocal Activity Detection in Close-talk Microphone Recordings of Meetings

نویسندگان

Kornel Laskowski

Tanja Schultz

چکیده

Using automatic speech recognition (ASR) word error rates (WERs) as a metric, the systems in (1) and (3) appear to have yield similar performance, in spite of significant additional architectural differences. Systems of type (2) have not been fielded for segmentation for ASR, and therefore cannot be directly compared. Although approaches of type (3) offer a significant advantage, namely the opportunity to directly constrain the number of simultaneously vocalizing participants, they come with the caveat of a variable acoustic vector size, since conversations/meetings can have variable numbers of participants. To overcome this difficulty, unsupervised acoustic models have been deployed [4], which do not require acoustic model training data (or training time). Our previous work has shown that this severly limits the number of features, as well as the minimum frame size. The aim of the current work is to develop a supervised acoustic model, capable of producing accurate density estimates for large feature vectors extracted from short frames, for scenario (3).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of Laughter-in-Interaction in Multichannel Close-Talk Microphone Recordings of Meetings

Laughter is a key element of human-human interaction, occurring surprisingly frequently in multi-party conversation. In meetings, laughter accounts for almost 10% of vocalization effort by time, and is known to be relevant for topic segmentation and the automatic characterization of affect. We present a system for the detection of laughter, and its attribution to specific participants, which re...

متن کامل

Modeling Vocal Interaction for Segmentation in Meeting Recognition

Automatic segmentation is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, participants typically vocalize for only a fraction of the recorded time, but standard vocal activity detection algorithms for close-talk microphones in meetings continue to treat participants independently. In this work we present a multispeaker segmentation ...

متن کامل

Time-frequency analysis of the glottal opening

Wolfgang Wokurek Institut f ur Maschinelle Sprachverarbeitung Universit at Stuttgart, Azenbergstra e 12, 70174 Stuttgart, Germany [email protected] ABSTRACT Simultaneous recordings of the laryngograph signal and speech recorded in an non-reverberating environment are investigated for acoustic evidence of the glottal opening within the microphone signal. It is demonstrated that the ...

متن کامل

Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays

Accurate detection, localization and tracking of multiple moving speakers permits a wide spectrum of applications. Techniques are required that are versatile, robust to environmental variations, and not constraining for non-technical end-users. Based on distant recording of spontaneous multiparty conversations, this thesis focuses on the use of microphone arrays to address the question “Who spo...

متن کامل

A MEMS Capacitive Microphone Modelling for Integrated Circuits

In this paper, a model for MEMS capacitive microphone is presented for integrated circuits. The microphone has a diaphragm thickness of 1 μm, 0.5 × 0.5 mm2 dimension, and an air gap of 1.0 μm. Using the analytical and simulation results, the important features of MEMS capacitive microphone such as pull-in voltage and sensitivity are obtained 3.8v and 6.916 mV/Pa, respectively while there is no...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

A Supervised Factorial Acoustic Model for Simultaneous Multiparticipant Vocal Activity Detection in Close-talk Microphone Recordings of Meetings

نویسندگان

چکیده

منابع مشابه

Detection of Laughter-in-Interaction in Multichannel Close-Talk Microphone Recordings of Meetings

Modeling Vocal Interaction for Segmentation in Meeting Recognition

Time-frequency analysis of the glottal opening

Spatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays

A MEMS Capacitive Microphone Modelling for Integrated Circuits

عنوان ژورنال:

اشتراک گذاری