Link to paper: https://arxiv.org/abs/1903.06353

Following the distributional hypothesis in semantics, the goal of this research is to adopt the skip-gram version of the word2vec model for the distributional representation of melodic units.

I’m not an expert in music theory, but apparently there exists some evidence that monophonic melodies adhere to the distributional hypothesis! That is, “you shall know a word motif by the company it keeps (those words motifs that are nearby it it)”. Alvarez & Gómez-Martin investigate this hypothesis by creating a skip-gram word2vec-ish model of a large collection of monophinic folk songs.

The authors are interseted in modelling melodic context using small musical motifs instead of words. They encode each interval using two digits: the note (first digit), and whether is ascending or descending (second digit, 1 or 0 respectively). For example, 21 represents an ascending major second. Repeated notes are encoded as 00. They build n-grams using adjacent intervals, using a greedy algorithm to record frequent occurances. After this encoding, they run a vanilla skip-gram model over their folk song corpus.

They don’t drill too hard into their results. However, computing cosine similarities between different interval vectors allows them to uncover shared motifs in different folk songs! It’s very exciting that motif embeddings are possible to create — and it makes me wonder whether companies like Spotify or Shazam are performing these computations already. Alvarez & Gómez-Martin readily admit that much more work is required to validate the embeddings, but this is a great start.