A Supervised Factorial Acoustic Model for Simultaneous Multiparticipant Vocal Activity Detection in Close-talk Microphone Recordings of Meetings
نویسندگان
چکیده
Using automatic speech recognition (ASR) word error rates (WERs) as a metric, the systems in (1) and (3) appear to have yield similar performance, in spite of significant additional architectural differences. Systems of type (2) have not been fielded for segmentation for ASR, and therefore cannot be directly compared. Although approaches of type (3) offer a significant advantage, namely the opportunity to directly constrain the number of simultaneously vocalizing participants, they come with the caveat of a variable acoustic vector size, since conversations/meetings can have variable numbers of participants. To overcome this difficulty, unsupervised acoustic models have been deployed [4], which do not require acoustic model training data (or training time). Our previous work has shown that this severly limits the number of features, as well as the minimum frame size. The aim of the current work is to develop a supervised acoustic model, capable of producing accurate density estimates for large feature vectors extracted from short frames, for scenario (3).
منابع مشابه
Detection of Laughter-in-Interaction in Multichannel Close-Talk Microphone Recordings of Meetings
Laughter is a key element of human-human interaction, occurring surprisingly frequently in multi-party conversation. In meetings, laughter accounts for almost 10% of vocalization effort by time, and is known to be relevant for topic segmentation and the automatic characterization of affect. We present a system for the detection of laughter, and its attribution to specific participants, which re...
متن کاملModeling Vocal Interaction for Segmentation in Meeting Recognition
Automatic segmentation is an important technology for both automatic speech recognition and automatic speech understanding. In meetings, participants typically vocalize for only a fraction of the recorded time, but standard vocal activity detection algorithms for close-talk microphones in meetings continue to treat participants independently. In this work we present a multispeaker segmentation ...
متن کاملTime-frequency analysis of the glottal opening
Wolfgang Wokurek Institut f ur Maschinelle Sprachverarbeitung Universit at Stuttgart, Azenbergstra e 12, 70174 Stuttgart, Germany [email protected] ABSTRACT Simultaneous recordings of the laryngograph signal and speech recorded in an non-reverberating environment are investigated for acoustic evidence of the glottal opening within the microphone signal. It is demonstrated that the ...
متن کاملSpatio-Temporal Analysis of Spontaneous Speech with Microphone Arrays
Accurate detection, localization and tracking of multiple moving speakers permits a wide spectrum of applications. Techniques are required that are versatile, robust to environmental variations, and not constraining for non-technical end-users. Based on distant recording of spontaneous multiparty conversations, this thesis focuses on the use of microphone arrays to address the question “Who spo...
متن کاملA MEMS Capacitive Microphone Modelling for Integrated Circuits
In this paper, a model for MEMS capacitive microphone is presented for integrated circuits. The microphone has a diaphragm thickness of 1 μm, 0.5 × 0.5 mm2 dimension, and an air gap of 1.0 μm. Using the analytical and simulation results, the important features of MEMS capacitive microphone such as pull-in voltage and sensitivity are obtained 3.8v and 6.916 mV/Pa, respectively while there is no...
متن کامل