An interface for extracting and remixing loops
- PDF, BIB, Slides, Poster.
- Smith, J. B. L., Y. Kawasaki, and M. Goto. 2019. Unmixer: An interface for extracting and remixing loops. Proceedings of the International Society for Music Information Retrieval Conference. 824–831. Delft, NL.
Remixing is hard
Have you ever tried to make a remix or mashup of a song? There’s a lot of overhead at the start: you need to decide which sounds you want to re-use, find them, cut them out and save as separate files, and load them into the software (called a DAW) that you’ll use to make the remix.
But in a typical song, all the samples are inextricably mixed: the synth melody you hope to re-use can’t easily be separated from the drum loops, bass lines, and other sounds that occur at the same time.
What if there were a way to automatically “de-compose” a song and isolate the ingredients used to compose it? That’s what Unmixer aims to do!
Unmixer gives users a way to extract loops from any song and start remixing and mashing them up with other songs right away.
We aimed for a simple interface that anyone would quickly understand.
Casual users can take advantage of the “Quick Start” option that loads up some demo samples extracted from Creative-Commons-licensed music.
Expert users may want more controls like equalizers, but we felt that anyone wanting to produce a remix or mashup would rather download them and treat them offline in their favourite DAW anyway.
That said, we’re exploring ways to give users more control, including different-size loops, keyboard controls, and some basic equalization—stay tuned! (If you have any suggestions, feel free to email us.)
How it works
It’s a 4-step process, but about 90% of the magic happens in step 3. Throughout, we use the Daft Punk song “Doin’ it Right” (featuring Panda Bear) as our test song.
NB: I don’t own the copyright to this song! I am including very short audio excerpts here for educational purposes only.
Step 1: estimate the downbeats. If you can count along to a song (“1, 2, 3, 4, 1, 2, …”), the downbeats are the 1s. It’s easy for people to do this, but reliable algorithms for estimating downbeats have only recently become available. We use madmom.
Step 2: compute the spectrogram and stack into a “spectral cube”. The spectrogram is a 2D representation of the signal that shows frequency vs. time in piece:
We snip the spectrogram at each downbeat boundary (the dotted lines), and then stack these 2D slices into a 3D volume, also called a tensor. The tensor has dimensions frequency vs. time in bar vs. bar in piece.
By the way, we use librosa for handling the audio.
Step 3: compute the non-negative Tucker decomposition. This is the magic part!
A typical technique to compress the information in a matrix or tensor is called factorization. In the case of a 2D matrix, this amounts to finding a small set of “template” columns that you can copy/paste out to recreate the matrix.
The non-negative Tucker decomposition (NTD) models our 3D tensor (freq. x time x bar) as 3 matrices—sound templates, rhythm templates, and loop templates—and a core tensor that tells us how to multiply these templates together.
This is the slowest part of the algorithm, so we’re investigating ways to speed it up. But the compression can be very efficient: in this case, our model is around 1/100th the size of the original spectral cube. We compute the NTD with TensorLy.
Step 4: extract the loops. This is done by multiplying out the NTD one loop at a time. To see how this works, consider loop template #13, represented by the 13th slice of the core tensor:
This slice is a “recipe” for the loop: it’s a matrix where each element (n,m) tells us to take the n-th sound, have it play with the m-th rhythm, and to add the result to the loop.
Multiplying the core tensor slice with the rhythm and sound templates thus generates a spectrum we can listen to: (click the image to hear the sound)
Lastly, the loop template tell us when these loops should occur in the piece. In the case of loop #13, it is repeated for 3 separate spans of the piece, highlighted below:
But we don’t actually have to use this information now. We’ve reconstructed the loop audio, and it’s ready to be inserted into the Unmixer interface!
Here’s a sound board of all the main loops of “Doin’ it Right”: clicking any loop triggers that sound, and the stop sign pauses anything.
That’s the gist of it! This explainer leaves out some very important details, like how to use masking techniques to reconstruct the sources from the spectra, how to choose which instance of a loop to extract, and how to adjust the factorization to produce loops that aren’t too similar. If you’re interested, these details are all explained in our ISMIR paper.
And of course, visit our website to try out the Unmixer yourself!