Multi-dimensional views of music structure (2017–2019)
Music structure is typically viewed as a one-dimensional phenomenon: each point in time is assigned a single label, which tells you which other points in the music are similar and which are different. However, musical structure is richer than that: different instrument parts can set up independent patterns of repetition; and different musical attributes, like melody, harmony and timbre, can be salient at different times. The following research projects explore these broader views of music structure.
Remix artists want to be able to access the loops and isolated sounds that were used to create a song, but might only have access to the mixed audio. The Unmixer gives them a way to access the “ingredients” of a song and experiment with mashup ideas, all in one place!
Plenty of electronic music is based around loops. We propose a way to identify and extract the loops via source separation, and to provide a map of the piece, all in a single step using nonnegative tensor factorization.
In typical songs, different instrument parts repeat at different times with different patterns. These patterns are evident in the score, but can we discover all of these independent patterns from audio recordings? We lay out a framework for this ambitious new problem, including methods for solving it, generating data, and evaluating our performance. The results so far are underwhelming, but through our efforts we learned a lot about the sparseness of the problem we posed.
We previously proposed a method for estimating what musical attributes someone was paying attention to when they analyzed a piece of music. We make several improvements to that algorithm, validate it using the sound examples from the Attention and Grouping project, and present a novel way to visualize the results.
We present a method of classifying videos found online according to what type of derivative work they are: covers, remixes, dance performances, lyric videos, and so on. By combining search, text, audio and video features, we are able to classify videos more accurately than a method based on YouTube search results alone.
Combining the fun of mash-ups with the challenge of logic puzzles, we present CrossSong: a music-based puzzle game in which the goal is to recognize the component parts of mash-ups. The player sees a grid of tiles, each containing a mash-up between two songs, and must rearrange them so that the tiles in each row and each column contain parts of the same song. The goal of the project was to make a fun game, but our research contributions include an algorithm for finding an optimal combination of mashups, and a user evaluation. Project page includes a link to the playable game!
My PhD thesis considered listener disagreements in the analysis of musical structure, and looked at the issue from a number of viewpoints in different disciplines: music information retrieval, music theory and music perception and cognition. The following four projects comprised the main chapters:
We tested whether the focus of a listener could affect their perception of structure. In an online listening study, we found that by manipulating someone’s attention, whether overtly or obliquely, we could influence the salience of boundaries and the structural groupings they prefered.
We propose a method for estimating what musical attributes someone was paying attention to when they analyzed a piece of music. The goal is to find a correlation between a recording and a listener’s annotation of it. We used a simple quadratic programming algorithm and obtained some interesting section-by-section maps of pieces.
How well does acoustic novelty account for boundary indications in an annotated corpus? We looked at how peaks in novelty (at various timescales and in various musical features) correlated with boundary indications. We found that novelty is a necessary but not sufficient condition for being a boundary.
How can two people listen to the same music, but disagree about its structure? In this case study co-authored by Isaac Schankler, we tried to trace the evolution of analytical disagreements to understand their origin. We analyzed the same pieces of music, and then thoroughly compared our justifications for our analyses when they disagreed. The disagreements seemed to boil down to differences in attention, prior knowledge and expectation. We published the article in Music Theory Online, which meant we could include all the relevant audio files, videos and illustrations.
After several years of running the evaluations of structural analysis at MIREX, what can we learn about which evaluation metrics are useful, which are redundant, and which songs are hardest to analyze? This ISMIR paper focused on these questions. The work was published as “open research”, meaning that all the tools and data used to produce the article are provided in a public repository.
For my Master’s thesis, submitted in August 2010, I conducted a comparative evaluation of a handful of algorithms that produce formal analyses of music on a diverse set of corpora, including a new corpus of public domain music.
The Structural Analysis of Large Amounts of Music Information (SALAMI) project is a multi-national effort to produce a corpus of analyses of hundreds of thousands of pieces of music. I oversaw the first phase of this project: the creation of a huge ground truth dataset. As part of my work, I developed and tested a novel annotation format, evaluated and hired annotators, oversaw months of data collection and presented the data at ISMIR 2011. I continue to manage and develop the dataset, which was released to the public in February 2012. It’s available now on GitHub.