Corpus Analysis Tools for Computational Hook Discovery
نویسندگان
چکیده
Compared to studies with symbolic music data, advances in music description from audio have overwhelmingly focused on ground truth reconstruction and maximizing prediction accuracy, with only a small fraction of studies using audio description to gain insight into musical data. We present a strategy for the corpus analysis of audio data that is inspired by the FANTASTIC toolbox and optimized for interpretable results. The approach brings two previously unexplored concepts to the audio domain: audio bigram distributions to describe melodic and harmonic content, and the use of corpus-relative or “second-order” descriptors. To test the real-world applicability of our method, we present an experiment in which we model song recognition data collected in a widely-played music game. By using the proposed corpus analysis pipeline we are able to present a cognitively adequate analysis that allows a model interpretation in terms of the listening history and experience of our participants. We find that our corpus-based audio features are able to explain a comparable amount of variance to symbolic features for this task when used alone and that they can supplement symbolic features profitably when the two types of features are used in tandem. We discuss the further potential of audio features for corpus analysis, and highlight new insights into what makes music recognizable.
منابع مشابه
Producing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations
The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...
متن کاملThe Efficiency of Corpus-based Distributional Models for Literature-based Discovery on Large Data Sets
This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies. Literature-based discov...
متن کاملHedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملA Noun Phrase Parser of English
A noun phrase parser is useful for several purposes, e.g. for index term generation in an information retrieval application; for the extraction of collocational knowledge from large corpora for the development of computational tools for language analysis; for providing a shallow but accurately analysed input for a more ambitious parsing system; for the discovery of translation units, and so on....
متن کامل