Music Structural Segmentation by Combining Harmonic and Timbral Information

نویسندگان

  • Ruofeng Chen
  • Ming Li
چکیده

We propose a novel model for music structural segmentation aiming at combining harmonic and timbral information. We use two-level clustering with splitting initialization and random turbulence to produce segment labels using chroma and MFCC separately as feature. We construct a score matrix to combine segment labels from both aspects. Finally Nonnegative Matrix Factorization and Maximum Likelihood are applied to extract the final segment labels. By comparing sparseness, our method is capable of automatically determining the number of segment types in a given song. 1. SEGMENTATION ALGORITHM This music structure segmentation algorithm is based on the model described in [1]. (1) Chroma and MFCC features are extracted from audio, making use of the algorithms in MIRToolbox 1 . (2) A two-level clustering algorithm is designed to calculate window-based segment labels using either chroma or MFCC as feature. The two-level clustering algorithm involves random turbulence module so it outputs different segmentation results each time. Repeat the twolevel clustering algorithm to get T segmentation results using chroma and T segmentation results using MFCC (we call them chroma solution and MFCC solution). (3) A score matrix representation is designed to count how many times two windows have identical segment labels in both chroma solution and MFCC solution. (4) Non-negative Matrix Factorization (NMF) is applied to the score matrix to approximate the score matrix with W × H , rank = 3, 4, 5. (5) Sparseness is calculated over all columns of the three Hs, and the H with the highest average sparseness is picked out. 1 https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materials/mirtoolbox Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. c © 2011 International Society for Music Information Retrieval. (6) Maximum Likelihood is applied to all columns of the picked H to get final segmentation result (we call it final solution). (7) Post-process is attached to final solution to remove isolated short segments and differentiate “intro” and “outro”. The flowgraph is shown on the next page. The two-level clustering algorithm is expanded in detail. The only difference between this algorithm and the one described in [1] is the attached post-process module at the rear end. Please refer to [1] for detailed description of the algorithm. 2. REFERENCES [1] R. Chen, M. Li: “Music Structural Segmentation by Combining Harmonic and Timbral Information,” ISMIR, Miami, Florida, 2011.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Music Structural Segmentation Across Genres with Gammatone Features

Music structural segmentation (MSS) studies to date mainly employ audio features describing the timbral, harmonic or rhythmic aspects of the music and are evaluated using datasets consisting primarily of Western music. A new dataset of Chinese traditional Jingju music with structural annotations is introduced in this paper to complement the existing evaluation framework. We discuss some statist...

متن کامل

WP 2.1.11. Blind Temporal Segmentation

An optimistic functional definition for the Music Browser would require segmentation of complex mixtures a song into: • segments corresponding to different rhythm patterns • segments corresponding to different timbral or textural patters • segments corresponding to different melodic/harmonic patterns • segments corresponding to different “structural” patterns (e.g. introduction, theme A, theme ...

متن کامل

Phrase-Level Audio Segmentation of Jazz Improvisations Informed by Symbolic Data

Computational music structure analysis encompasses any model attempting to organize music into qualitatively salient structural units, which can include anything in the heirarchy of large scale form, down to individual phrases and notes. While much existing audio-based segmentation work attempts to capture repetition and homogeneity cues useful at the form and thematic level, the time scales in...

متن کامل

Classifying Music Audio with Timbral and Chroma Features

Music audio classification has most often been addressed bymodeling the statistics of broad spectral features, which, by design, exclude pitch information and reflect mainly instrumentation. We investigate using instead beat-synchronous chroma features, designed to reflect melodic and harmonic content and be invariant to instrumentation. Chroma features are less informative for classes such as ...

متن کامل

On the Inherent Segment Length in Music

Music consists of sounds organized in time. These sounds can be understood from a rhythmic, timbral, or harmonic point of view, and they can be understood on different time scales, going from the very short (note onsets) to the medium (grouping), to the large scale with musical form. Note onsets, grouping and form are common musical terms, which can be compared to different aspects of audition,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011