New articles by Watanabe and Yang

January 20, 2017

Last year, I helped to advise two colleagues on separate projects, and both were recently published. The projects and first authors are: “Modeling discourse segments in lyrics using repeated patterns”, by Kento Watanabe; and “Probabilistic Transcription of Sung Melody Using a Pitch Dynamics Model” by Luwei Yang.

Kento Watanabe has done lots of work related to song lyrics, and is interested in modeling the “discourse structure” of lyrics: that is, he wants to model the large-scale organization of the lyrics, including how the subject changes from stanza to stanza.

He analyzed a large corpus of lyrics and found that repetition in the lyrics often matched the position of boundaries. For example, any line that is similar to the last line of the lyrics has a strong chance of being the last or penultimate line of a stanza. He used these and other insights to build and test a lyrics-segmentation algorithm.

Details: “Modeling discourse segments in lyrics using repeated patterns.” By Kento Watanabe, Yuichiroh Matsubayashi, Naho Orita, Naoaki Okazaki, Kentaro Inui, Satoru Fukayama, Tomoyasu Nakano, Jordan B. L. Smith, and Masataka Goto. In Proceedings of the International Conference on Computational Linguistics (COLING). 2016.

Abstract: This study proposes a computational model of the discourse segments in lyrics to understand and to model the structure of lyrics. To test our hypothesis that discourse segmentations in lyrics strongly correlate with repeated patterns, we conduct the first large-scale corpus study on discourse segments in lyrics. Next, we propose the task to automatically identify segment boundaries in lyrics and train a logistic regression model for the task with the repeated pattern and textual features. The results of our empirical experiments illustrate the significance of capturing repeated patterns in predicting the boundaries of discourse segments in lyrics.

Luwei Yang is interested in precise pitch-tracking of the human voice and instruments like the violin and erhu. For these, it is hard to detect and untangle different types of pitch fluctuations, like portamento (sliding from one note to another), vibrato (rapid oscillation in pitch), and tremolo (rapid oscillation in loudness). In this work, he proposes and evaluates a new way to model between-note and within-note pitch fluctuations.

Details: “Probabilistic transcription of sung melody using a pitch dynamics model.” By Luwei Yang, Akira Maezawa, Jordan B. L. Smith, and Elaine Chew. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. New Orleans, LA, USA. 2017.

Abstract: Transcribing the singing voice into music notes is challenging due to pitch fluctuations such as portamenti and vibratos. This paper presents a probabilistic transcription method for monophonic sung melodies that explicitly accounts for these local pitch fluctuations. In the hierarchical Hidden Markov Model (HMM), an upper-level ergodic HMM handles the transitions between notes, and a lower-level left-to-right HMM handles the intra- and inter-note pitch fluctuations. The lower level HMM employs the pitch dynamic model, which explicitly expresses the pitch curve characteristics as the observation likelihood over f0 and ∆f0 using a compact parametric distribution. A histogram-based tuning frequency estimation method and some post-processing heuristics to separate merged notes and to allocate spuriously detected short notes improve the note recognition performance. Selecting model parameters that support intuitions about singing behavior, the proposed method obtained encouraging results when evaluated on a published monophonic sung melody dataset, and compared with state-of-the-art methods.