Semantic Similarity for Music Retrieval
نویسندگان
چکیده
We present a query-by-example system for content-based music information retrieval by ranking items in a database based on semantic similarity, rather than acoustic similarity, to a query example. The retrieval system is based on semantic concept models that are learned from the CAL500 data set containing both audio examples and their text captions. Using the concept models, the audio tracks are mapped into a semantic feature space, where each dimension indicates the strength of the semantic concept. Audio similarity and retrieval is then based on ranking the database tracks by their similarity to the query in the semantic space. 1 MODELING AUDIO AND SEMANTICS Our query-by-example music information retrieval (MIR) system takes an audio track as a query and retrieves new audio tracks that have similar semantic descriptions to the query track. For example, given a piece of music that a listener might describe as “crazy guitar rock with a screaming female singer that makes me want to get up and dance”, our system ranks all retrievable songs by how well they fit this description. The system is based on the models of [9, 3] which have shown promise in the domains of audio and image retrieval. Audio models are learned from a database of audio tracks with associated text captions that describe the audio content: D = {(A, c), ..., (A(|D|), c(|D|))} (1) whereA and c represent the d-th audio track and the associated text caption, respectively. Each caption is a set of words from a fixed vocabulary, V . We train our system using the semantic labels from the CAL-500 data set [9] of 500 songs, each annotated by at least 3 humans using up to 200 words. We require that each word be positively associated with at least 10 songs, resulting in a vocabulary of 146 words (|V| = 146). c © 2007 Austrian Computer Society (OCG). 1.1 Modeling Audio Tracks The audio data for a single track is represented as a bagof-feature-vectors, i.e., an unordered set of feature vectors A = {a1, . . . ,a|A|} that are extracted from the audio signal. For each 22050Hz-sampled, monaural audio track, we compute the first 13 Mel-frequency cepstral coefficients as well as their first and second instantaneous derivatives for each half-overlapping short-time (∼12 msec) segment [2], resulting in about 5000 39-dimensional feature vectors per 30 seconds of audio content. Each database track d is compactly represented as a probability distribution over the audio feature space, P (a|d). The track distribution is approximated as a K-component Gaussian mixture model (GMM);
منابع مشابه
Publishing Music Similarity Features on the Semantic Web
We describe the process of collecting, organising and publishing a large set of music similarity features produced by the SoundBite [10] playlist generator tool. These data can be a valuable asset in the development and evaluation of new Music Information Retrieval algorithms. They can also be used in Web-based music search and retrieval applications. For this reason, we make a database of feat...
متن کاملEyes4Ears - More than a Classical Music Retrieval System
Content-based similarity search for music retrieval attracted a lot attention in recent information retrieval research. Most music applications (e.g. several commercial web portals) offer to search music files, which however is limited to key-word-based search on subjects like genre or artist. Other similarity search approaches base on abstract metrics, which are defined on feature vectors repr...
متن کاملPrototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica
Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...
متن کاملFeature Preprocessing with Restricted Boltzmann Machines for Music Similarity Learning
Computational modelling of music similarity constitutes a key element for music information retrieval and recommendation systems. Similarity models and their analysis are also important for research in musicology and music perception. In this study, we test feature preprocessing with Restricted Boltzmann Machines in combination with established methods for learning distance measures. Our experi...
متن کاملAnchor space for classification and similarity measurement of music
This paper describes a method of mapping music into a semantic space that can be used for similarity measurement, classification, and music information retrieval. The value along each dimension of this anchor space is computed as the output from a pattern classifier which is trained to measure a particular semantic feature. In anchor space, distributions that represent objects such as artists o...
متن کاملComparison and partial ordering of music by applying a generic semantic index
Instances of simple data types like strings or numbers can easily be compared and arranged on the basis of a lexical or numerical ordering system. Generic operations are available for the test of equality, greater-thanor less-than-relations can be generated and sometimes even similarity can be assessed. But how to handle multimedia data types such as music? Can we compare music at all? Despite ...
متن کامل