Georgia Tech Music Intelligence Group (GTMIG)

We are interested in creating systems that can understand musical sound. Our guiding principles are that:

  • Machine understanding of music will serve as a foundation for applications that help us to enjoy music.
  • Understanding human cognition of musical sound will help us to develop these technologies and provides an excellent window into human cognition more generally.
  • Modeling music will help us to understand human creativity and lead to new creative partnerships between musicians as well as between musicians and technology.

Below we describe some of our current and proposed research. A full list of publications and accompanying presentations is below.

This research is supported by the National Science Foundation under Grants No. 11S-0855758 and IIS-1054659

[read more]

Music Information Retrieval

Predictive Music Modeling (NSF Award IIS-1054659)

When a person is listening to a song, she is anticipating, at any given moment, the timing and nature of the next event by decoding the musical signal. Even when analyzing a simple song, the brain utilizes complex correlations between the musical elements to make accurate predictions. Musical signals are richly patterned, with long-term dependencies, dependencies across time-scales, and correlations between parallel information streams; the melody depends on the rhythm, the rhythmic patterns depend on the form, and the intonation of the pitch depends on the placement within the phrase. The goal of this NSF CAREER project is to develop machine- learning (ML) models for predicting temporally structured events in the context of music, which take advantage of these complex correlations, and to use these models to help explain human musical expectation.

Modeling Musical Creativity (NSF Award IIS-0855758)

This project seeks to understand, model, and support improvisation, or realtime creativity, in the context of music. The study will use an interdisciplinary approach involving ethnography, music theory, statistical modeling, machine learning, signal processing, instrument design, and cognitive studies. The objectives are to develop computational models of improvisation and to use them to develop new technologies that support creativity in music and education.

Raag Recognition - Raag Vidya

To our knowledge, we have created the first automatic raag recognition system which we call Raag Vidya. Our work is based on creating a theoretical framework that allows us to translate raag into a representation that reveals underlying structure, despite tremendous variety and complexity in the performance of raag and the sometimes subtle distinctions between raags.

Just as key recognition is essential for understanding Western tonal music (including pop, rock, jazz, etc) -- because it is fundamental to understanding the melodic and harmonic content of a piece -- raag understanding is essential for systems that interact with Indian music. Further our attempts to automatically recognize raag have provided theoretical insights into the nature of raag.

Realtime Tabla and Mridangam Recognition - Tabla Gyan and Dangum

Tabla and Mrdangam are the main percussion instruments of North and South India respectively. Both traditions are highly virtuosic and are organized around timbre. "melodies" are created from timbres as opposed to pithces and organized in a highly structured way. The system is analaogus to language in that small units are combined hierarchically to form larger expressions

We have created the first systems that are able to 'listen' to tabla and mrdangam and understand the the rhythmic and timbral information, i.e. what stroke was played and when. We have begun using this information to create realtime interactive systems such as Tabla Gyan and Dangum.

Content Based Recommendation - Hubs and Domain-specificity

One of the most active areas of MIR research is content-based recommendation, analyzing the semantic content of songs to judge their relatedness. This information is used to make recommendations of the form: "if you like A, you'll also like B".

We have recently been working on two problems. The first has attempted to understand the problem of why certain songs are consistently recommended even when they are inappropriate, often referred to as the "hubness" problem. Our work has explored whether this is an artifact or a real phenomenon and explored approaches to minimize it. My student Mark Godfrey has been blogging about this work here.

The second problem we are addressing is customizing recommendation systems so that the similarity models that underly them exploit knowledge about particular musical styles, in our case Indian music. Because Indian music tends to be much less polyphonic than Western music, and is often based on raag, it is possible to extend the current CBR models that focus on timbre exclusively to incorporate melodic information. In this work, we attempt to pitch-track the main melodic line and use these data to understand the melodic and tonal characteristics of the work. We build on our raag recognition work that is based on scale degree statistics of the melody.

Music Perception and Cognition

Emotion in Raag

I am interested in constructing a cognitively grounded theory of raag, the fundamental melodic concept of Indian music, that explains how raags evoke emotions. I have conducted research that shows that raags do in fact reliably evoke different clusters of emotions (Chordia 08). To understand this, I have also conducted experiments to understand the role of basic acoustic cues such as sensory dissonance (Chordia 07) as well as responses that depend on tension and relaxation due to statistically learned schemas.

Statistical Learning and Expectation Evoked Emotion (with David Huron)

It has been shown that listeners internalize statistical properties of music, such as how frequently certain chords are used. We have been examining whether certain emotional responses to raag music can be traced to pitch statistics such as frequency of usage and conditional frequency of usage (i.e. how often one note follows another). We have also begun examining the role that micro-pitch structure such as pitch glides and ornaments play in evoking emotion.

Because there are many correlated features in real music, we are also designing experiments using "artificial" music systems. This has two advantages: we are able to precisely control what musical parameter is varied, and we are able to control the exposure of subjects to the artificial style. We are using this paradigm to explore the age-old question of why minor keys sound "sad".

Basic Auditory Cues for Emotion (with Vinod Menon [Stanford], David Huron [OSU], and Daniel Abrams [Stanford])

Fundamental to survival is the balance between fear and exploratory behavior. It is known that basic auditory cues such as sudden intensity changes cause orienting responses. In this work we are exploring amygdala activation in response to simple stimuli such as rising and falling intensity and rising and falling pitch tones. We are particularly interested in asymmetric processing of rising and falling cues. We are also interested in the temporal pattern of neuronal activation for oddball stimuli where it is hypothesized that a fast fear response is followed by an inhibitory response.

Why study Indian music?

I am often asked why I tend to focus my application on Indian music. This first reason is that Indian music encompasses a vast array of important musical styles. Second, because the underlying melodic and rhythmic frameworks are common to a great deal of music from South Asia, the Middle East and North Africa. Third, I believe that advances in music technologies will be sparked...

[read more]


I have assemlbed two susbstantial databases primarily of Indian Classical Music.


A large collection covering 31 raags, including several lengthy recordings made specifically for this project. Some are studio recordings with no accompaniment (drone or percussion).


A diverse collection covering very many raags and sub-genres.