Video categorization article accepted to ICME

February 28, 2017

Yesterday, we were pleased to receive the acceptance notification for our submission to the 2017 International Conference on Multimedia and Expo entitled: “Classifying derivative works with search, text, audio and video features.”

Of greatest interest to MIR researchers will be the “novel set of audio features derived from audio fingerprints” that we used to detect similarity between audio files. Here is the abstract:

Details: “Classifying derivative works with search, text, audio and video features.” By Jordan B. L. Smith, Masahiro Hamasaki, and Masataka Goto. In Proceedings of the International Conference on Multimedia and Expo. Hong Kong, China. 2017.

Abstract: Users of video-sharing sites often search for derivative works of music, such as live versions, covers, and remixes. Audio and video content are both important for retrieval: “karaoke” specifies audio content (instrumental version) and video content (animated lyrics). Although YouTube’s text search is fairly reliable, many search results do not match the exact query. We introduce an algorithm to automatically classify YouTube videos by category of derivative work. Based on a standard pipeline for video-based genre classification, it combines search, text, and video features with a novel set of audio features derived from audio fingerprints. A baseline approach is outperformed by the search and text features alone, and combining these with video and audio features performs best of all, reducing the audio content error rate from 25% to 13%.

I look forward to visiting Hong Kong for the first time to present our work!