Improving accompanied Flamenco singing voice transcription by combining vocal detection and predominant melody extraction
نویسندگان
چکیده
While recent approaches to automatic voice melody transcription of accompanied flamenco singing give promising results regarding pitch accuracy, mistakenly transcribed guitar sections represent a major limitation for the obtained overall precision. With the aim of reducing the amount of false positives in the voicing detection, we propose a fundamental frequency contour estimation method which extends the pitch-salience based predominant melody extraction [3] with a vocal detection classifier based on timbre and pitch contour characteristics. Pitch contour segments estimated by the predominant melody extraction algorithm containing a high percentage of frames classified as non-vocal are rejected. After estimating the tuning frequency, the remaining pitch contour is segmented into single note events in an iterative approach. The resulting symbolic representations are evaluated with respect to manually corrected transcriptions on a frame-by-frame level. For two small flamenco dataset covering a variety of singers and audio quality, we observe a significant reduction of the voicing false alarm rate and an improved voicing F-Measure as well as an increased overall transcription accuracy. We furthermore demonstrate the advantage of vocal detection model trained on genre-specific material. The presented case study is limited to the transcription of Flamenco singing, but the general framework can be extended to other styles with genre-specific instrumentation.
منابع مشابه
Predominant Fundamental Frequency Estimation vs Singing Voice Separation for the Automatic Transcription of Accompanied Flamenco Singing
This work evaluates two strategies for predominant fundamental frequency (f0) estimation in the context of melodic transcription from flamenco singing with guitar accompaniment. The first strategy extracts the f0 from salient pitch contours computed from the mixed spectrum; the second separates the voice from the guitar and then performs monophonic f0 estimation. We integrate both approaches wi...
متن کاملSimulated Formant Modeling of Accompanied Singing Signals for Vocal Melody Extraction
This paper deals with the task of extracting vocal melodies from accompanied singing recordings. The challenging aspect of this task consists in the tendency for instrumental sounds to interfere with the extraction of the desired vocal melodies, especially when the singing voice is not necessarily predominant among other sound sources. Existing methods in the literature are either rule-based or...
متن کاملMelody Extraction on Vocal Segments Using Multi-Column Deep Neural Networks
Singing melody extraction is a task that tracks pitch contour of singing voice in polyphonic music. While the majority of melody extraction algorithms are based on computing a saliency function of pitch candidates or separating the melody source from the mixture, data-driven approaches based on classification have been rarely explored. In this paper, we present a classification-based approach f...
متن کاملSinging Voice Melody Transcription Using Deep Neural Networks
This paper presents a system for the transcription of singing voice melodies in polyphonic music signals based on Deep Neural Network (DNN) models. In particular, a new DNN system is introduced for performing the f0 estimation of the melody, and another DNN, inspired from recent studies, is learned for segmenting vocal sequences. Preparation of the data and learning configurations related to th...
متن کاملTranscription of vocal melodies using voice characteristics and algorithm fusion
This paper deals with the transcription of vocal melodies in music recordings. The proposed system relies on two distinct pitch estimators which exploit characteristics of the human singing voice. A Hidden Markov Model (HMM) is used to fuse the pitch estimates and make voicing decisions. The resulting performance is evaluated on the MIREX 2006 Audio Melody Extraction data.
متن کامل