Learning to Pinpoint Singing Voice from Weakly Labeled Examples

نویسنده

  • Jan Schlüter
چکیده

Building an instrument detector usually requires temporally accurate ground truth that is expensive to create. However, song-wise information on the presence of instruments is often easily available. In this work, we investigate how well we can train a singing voice detection system merely from song-wise annotations of vocal presence. Using convolutional neural networks, multipleinstance learning and saliency maps, we can not only detect singing voice in a test signal with a temporal accuracy close to the state-of-the-art, but also localize the spectral bins with precision and recall close to a recent source separation method. Our recipe may provide a basis for other sequence labeling tasks, for improving source separation or for inspecting neural networks trained on auditory spectrograms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling musical articulation gestures in singing voice performances

We present a procedure to automatically describe musical articulation gestures used in singing voice performances. We detail a method to characterize temporal evolution of fundamental frequency and energy contours by a set of piece-wise fitting techniques. Based on this, we propose a meaningful parameterization that allows reconstructing contours from a compact set of parameters at different le...

متن کامل

Singing Voice Separation Using Deep Neural Networks and F0 Estimation

Deep Neural Networks (DNN) have become a popular approach for speech enhancement, and singing voice separation. DNNs are typically trained to estimate a timefrequency mask using ground truth examples. In this submission, we combine DNN estimation as a first step with traditional refinement via F0 estimation, using the YINFFT algorithm.

متن کامل

Semi-Supervised Learning with Very Few Labeled Training Examples

In semi-supervised learning, a number of labeled examples are usually required for training an initial weakly useful predictor which is in turn used for exploiting the unlabeled examples. However, in many real-world applications there may exist very few labeled training examples, which makes the weakly useful predictor difficult to generate, and therefore these semisupervised learning methods c...

متن کامل

Classification-Based Singing Melody Extraction Using Deep Convolutional Neural Networks

Singing melody extraction is the task that identifies the melody pitch contour of singing 1 voice from polyphonic music. Most of the traditional melody extraction algorithms are based on 2 calculating salient pitch candidates or separating the melody source from the mixture. Recently, 3 classification-based approach based on deep learning has drawn much attentions. In this paper, 4 we present a...

متن کامل

Multiple-Feature Fusion Based Onset Detection for Solo Singing Voice

Onset detection is a challenging problem in automatic singing transcription. In this paper, we address singing onset detection with three main contributions. First, we outline the nature of a singing voice and present a new singing onset detection approach based on supervised machine learning. In this approach, two Gaussian Mixture Models (GMMs) are used to classify audio features of onset fram...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016