Tabla and Mridangam Transcription
Introduction to tabla
Tabla is the most widely used percussion instrument in Indian music, both as an accompanying and solo instrument. Its two component drums are played with the fingers and hands and produce a wide variety of different timbres, each of which has been named. There are approximately fifteen acoustically distinct strokes that fall in three broad categories:
- 1) resonant, ringing strokes played on the treble drum,
- 2) non-resonant, noisy strokes played on either drum,
- 3) low, resonant strokes played on the bass drum.
Introduction to mridangam
Mridangam is a South Indian percussion instrument which shares some of these characteristics with tabla. In particular, there is a vocabulary of timbrally distinct strokes, and while most do not sound that similar to tabla, they do fall into a similar set of broad categories. I have been experimenting with applying many of the same techniques I have used on tabla transcription and recognition to mridangam. Below is a video of a performance of Dangum, a duet between mridangam and computer, performed at Listening Machines 2008.
Dangum at Listening Machines 2008 from Parag Chordia on Vimeo
The system
This figure shows an overview of the Tabla Gyan system. The system listens to the audio stream from a tabla, identifies the stroke types and timings, and then applies a number of transformations enabled by this abstracted form, before resynthesizing a response to be played back immediately through speakers. This architecture allows for a flexible call and response form, in which one can easily alter the character of the computer's response in realtime.
Here we can see in a little more detail the structure of the stroke recognition algorithm. The incoming audio is segmented by an onset detector, and the timings are stored. Then, spectral features calculated on each segmented stroke are fed into a classifier trained on previously segmented and analyzed strokes, which outputs a label representing the type of stroke.
These are diagrams of some of the transformations that can be applied once the audio has been reduced to an abstracted score. Timbres, or stroke labels, can be easily remapped to other timbres or labels. Rhythmic transformations are also possible; here we show "conditional repetition," in which one type of stroke can be replaced with triplets, quintuplets, or generally "n-tuplets" of the same stroke.
This is a screenshot of the latest interface to Tabla Gyan. The interface, and much of the transformation described above, is implemented in Max/MSP. It allows the user to set a variety of parameters, choose amongst different training sets, receive cues from the system (e.g. whether it is in fact listening at that moment), and manipulate the response.