Research

Sanidha: A Studio Quality Multi-Modal Dataset for Carnatic Music

Music source separation demixes a piece of music into its individual sound sources (vocals, percussion, melodic instruments, etc.), a task with no simple mathematical solution. It requires deep learning methods involving training on large datasets of isolated music stems. The most commonly available datasets are made from commercial Western music, limiting the models' applications to non-Western genres like Carnatic music. Carnatic music is a live tradition, with the available multi-track recordings containing overlapping sounds and bleeds between the sources. This poses a challenge to commercially available source separation models like Spleeter and Hybrid Demucs. In this work, we introduce Sanidha, the first open-source novel dataset for Carnatic music, offering studio-quality, multi-track recordings with minimal to no overlap or bleed. Along with the audio files, we provide high-definition videos of the artists' performances. Additionally, we fine-tuned Spleeter, one of the most commonly used source separation models, on our dataset and observed improved SDR performance compared to fine-tuning on a pre-existing Carnatic multi-track dataset. The outputs of the fine-tuned model with Sanidha are evaluated through a listening study.

Rhythm Recreation Study to Inform Intelligent Pedagogy Systems

Computer-based intelligent pedagogy systems have great potential to provide interactive music lessons to those unable to access conventional, face-to-face music instruction from human experts. A key component of any effective pedagogy system is the expert domain knowledge used to generate, present, and evaluate the teachable content that makes up the “syllabus” of the system [1]. In this thesis, we investigate the application of computational musicology algorithms to devise the “syllabus” of intelligent rhythm pedagogy software through a rhythm recreation study. A large part of this thesis is dedicated to the development of a web-based computer-aided rhythm tutor used as an experimental interface to conduct the rhythm recreation study. The resulting system can present targeted rhythm exercises to the participants and provide immediate feedback on the participant’s rhythm performance audio.

Pupil dilation as a function of pitch discrimination difficulty: A replication of Kahneman and Beatty, 1967

A replication of a seminal paper by Kahneman, D. & Beatty, J. (1967), for using pupillometry as an implicit measure of auditory processing load, specifically, non-verbal auditory processing. Kahneman and Beatty's paper, despite that it was published more than 50 years ago, continues to be the primary citation to support the claim that pupillometry is a reliable index of task difficulty for a simple non-verbal pitch discrimination task therefore giving us an implicit measure for listening effort. This type of task takes very little explicit memory, is non-verbal, and relies heavily on more low-level, automatic perceptual processing. 

Using two different replication studies, one exact, and one modified, we only replicated the main result in the modified replication. The true replication failed to replicate on all nine statistical tests. Overall, our findings suggest that pupil dilation can be used as an implicit measure of task difficulty for a simple, non-semantic, auditory task, however, the robustness of the effect appears relatively weak in comparison with the original study, and the amount of variation across participants much greater.