Context-dependent sound event detection
نویسندگان
چکیده
The work presented in this article studies how the context information can be used in the automatic sound event detection process, and how the detection system can benefit from such information. Humans are using context information to make more accurate predictions about the sound events and ruling out unlikely events given the context. We propose a similar utilization of context information in the automatic sound event detection process. The proposed approach is composed of two stages: automatic context recognition stage and sound event detection stage. Contexts are modeled using Gaussian mixture models and sound events are modeled using three-state left-to-right hidden Markov models. In the first stage, audio context of the tested signal is recognized. Based on the recognized context, a context-specific set of sound event classes is selected for the sound event detection stage. The event detection stage also uses context-dependent acoustic models and count-based event priors. Two alternative event detection approaches are studied. In the first one, a monophonic event sequence is outputted by detecting the most prominent sound event at each time instance using Viterbi decoding. The second approach introduces a new method for producing polyphonic event sequence by detecting multiple overlapping sound events using multiple restricted Viterbi passes. A new metric is introduced to evaluate the sound event detection performance with various level of polyphony. This combines the detection accuracy and coarse time-resolution error into one metric, making the comparison of the performance of detection algorithms simpler. The two-step approach was found to improve the results substantially compared to the context-independent baseline system. In the block-level, the detection accuracy can be almost doubled by using the proposed context-dependent event detection.
منابع مشابه
Sound Event Detection and Context Recognition
Humans can easily segregate and recognize one sound source from an acoustic mixture, and recognize a certain voice from a busy background which includes other people talking and music. Sound event detection and classification aims to process an acoustic signal and convert it into descriptions of the corresponding sound events present at the scene. This is useful, e.g., for automatic tagging in ...
متن کاملNeural representations of auditory input accommodate to the context in a dynamically changing acoustic environment.
The auditory scene is dynamic, changing from 1 min to the next as sound sources enter and leave our space. How does the brain resolve the problem of maintaining neural representations of the distinct yet changing sound sources? We used an auditory streaming paradigm to test the dynamics of multiple sound source representation, when switching between integrated and segregated sound streams. The ...
متن کاملContext-Dependent Event Detection in Sensor Networks
Event-based systems are well suited for application in sensor networks. Compared to the traditional application domains of event-based systems however sensor networks impose a number of challenging requirements. In this short paper, we present an architecture designed to address these requirements: the CoDED platform for context-dependent event detection.
متن کاملSound Event Detection for Real Life Audio DCASE Challenge
We explore logistic regression classifier (LogReg) and deep neural network (DNN) on the DCASE 2016 Challenge for task 3, i.e., sound event detection in real life audio. Our models use the Mel Frequency Cepstral Coefficients (MFCCs) and their deltas and accelerations as detection features. The error rate metric favors the simple logistic regression model with high activation threshold on both se...
متن کاملNeurophysiological evidence for context-dependent encoding of sensory input in human auditory cortex.
Attention biases the way in which sound information is stored in auditory memory. Little is known, however, about the contribution of stimulus-driven processes in forming and storing coherent sound events. An electrophysiological index of cortical auditory change detection (mismatch negativity [MMN]) was used to assess whether sensory memory representations could be biased toward one organizati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- EURASIP J. Audio, Speech and Music Processing
دوره 2013 شماره
صفحات -
تاریخ انتشار 2013