Tabla and Mridangam Transcription

Introduction to tabla

Tabla is the most widely used percussion instrument in Indian music, both as an accompanying and solo instrument. Its two component drums are played with the fingers and hands and produce a wide variety of different timbres, each of which has been named. There are approximately fifteen acoustically distinct strokes that fall in three broad categories:

These named strokes form the basic vocabulary of tabla music and are played in sequence to form typical phrases. Tabla solo is a centuries-old tradition centered around extended structured improvisations. The demands of this format have led to the sophisticated use of timbre and rhythm as a foreground element. In this music, the choice of strokes is precise, each one functioning like a note in a melody; the timbral and rhythmic structures are equally important and carefully integrated into a singing line.

Introduction to mridangam

Mridangam is a South Indian percussion instrument which shares some of these characteristics with tabla. In particular, there is a vocabulary of timbrally distinct strokes, and while most do not sound that similar to tabla, they do fall into a similar set of broad categories. I have been experimenting with applying many of the same techniques I have used on tabla transcription and recognition to mridangam. Below is a video of a performance of Dangum, a duet between mridangam and computer, performed at Listening Machines 2008.



Dangum at Listening Machines 2008 from Parag Chordia on Vimeo

The system

This figure shows an overview of the Tabla Gyan system. The system listens to the audio stream from a tabla, identifies the stroke types and timings, and then applies a number of transformations enabled by this abstracted form, before resynthesizing a response to be played back immediately through speakers. This architecture allows for a flexible call and response form, in which one can easily alter the character of the computer's response in realtime.

Here we can see in a little more detail the structure of the stroke recognition algorithm. The incoming audio is segmented by an onset detector, and the timings are stored. Then, spectral features calculated on each segmented stroke are fed into a classifier trained on previously segmented and analyzed strokes, which outputs a label representing the type of stroke.

These are diagrams of some of the transformations that can be applied once the audio has been reduced to an abstracted score. Timbres, or stroke labels, can be easily remapped to other timbres or labels. Rhythmic transformations are also possible; here we show "conditional repetition," in which one type of stroke can be replaced with triplets, quintuplets, or generally "n-tuplets" of the same stroke.

This is a screenshot of the latest interface to Tabla Gyan. The interface, and much of the transformation described above, is implemented in Max/MSP. It allows the user to set a variety of parameters, choose amongst different training sets, receive cues from the system (e.g. whether it is in fact listening at that moment), and manipulate the response.