Raag Classification

Introduction to raag

The influence of raag music is widespread. In the Indian subcontinent diverse styles such as Qawwali, Ghazal, Thumri, and Hindi film music are all largely based on or heavily influenced by this conception of melody. Looking beyond the Indian subcontinent one sees similar melodic systems in the Middle east and North Africa. With the exception of tonal Western music, it is arguably the most influential melodic system in the world.

North Indian Classical Music is one of the oldest continuous musical traditions in the world, and it is an active contemporary performance practice. The repertoire is extensive, consisting of dozens of styles and hundreds of significant performers. The most prevalent instruments, such as sitar, sarod and tabla, are timbrally quite different from popular Western instruments. NICM is an oral tradition and recordings therefore represent the primary materials.

Almost all North Indian classical music (NICM) is organized around a melodic abstraction known as Raag. A raag is most easily explained as a collection of melodic gestures and a technique for developing them. The gestures are sequences of notes that are often inflected with various micro-pitch alterations and articulated with an expressive sense of timing. Longer phrases are built by joining these melodic atoms together.

Although there is considerable continuous pitch motion due to the way notes are connected and ornamented, it is nevertheless accurate to consider melodies to be composed from a discrete set of pitches, or a scale. These notes are drawn from the twelve chromatic pitches of a just-intoned scale. There are almost no raags in which stable, held tones fall outside of this chromatic scale. Micro-pitch structure, however, is often essential and the same nominal note may take on a different character depending on how it is articulated. In some raags there are consistent pitch-time trajectories that are essential to the character of the raag. For example, in raag Shree there are several ways of articulating the minor second; in one instance the pitch oscillates between the minor second and the tonic, creating the sense of a flattened minor second. However, a musician would never simply hold this pitch without the characteristic oscillation and linkage to the stable tone.

Indian classical music (ICM) uses several hundred raags, of which one hundred are common. There are theoretically thousands of scale types; in practice, however, raags conform to a much smaller set of scales, and many of the most common raags share the same set of notes.

The presentation of raag typically proceeds in several sections. In the first section (alap), the main melodic instrument, accompanied only by the drone, slowly develops the melodic framework. In later sections, with the accompanying tabla (the main NICM percussion instrument), the emphasis shifts to faster sequences of notes, leaving behind most of the subtleties of pitch articulation. In both cases, the characteristic phrases of the raag are often repeated with variations

Pitch Class Distributions

Pitch Class Distributions (PCDs) are a compact way of describing the tonal content of a piece or excerpt of music. A PCD is simply a histogram of the pitches which occur in the piece, and is usually generated automatically using a pitch detection algorithm. Normally, these pitches are discretized to a standard scale, that is, we assume that all the relevant material falls within a known scale, and pitches do not vary freely. PCDs may be weighted by the duration of the pitches encountered, so that a note which is held for a long time gets a correspondingly greater weight in the histogram. This is inherent to the calculation when the audio is processed on a frame-by-frame basis, and each chunk of audio is processed independently, without considering whether it belongs to the same "note" as the previous or following frame.

It has been shown that listeners are sensitive to this distribution. Krumhansl, in the now famous probe-tone method showed that tones that listeners judged as most fitting in a particular tonal context where the most frequently used tones in the key. Others have shown that when making speeded judgments about final melodic tones, reaction time was inversely proportional to the frequency of the tone in that key context.

When analyzing an audio signal to derive a PCD, several issues come up. First, the pitch data must be extracted. This is a non-trivial task in many circumstances, and works best when the input is essentially monophonic; while multi-pitch estimation is a progressing field, the techniques are not yet that robust or reliable. Typically, the input signal is broken into very short frames (several milliseconds) of audio, which are separately analyzed for pitch content [yin, hpcp]. Neighboring frames may be compared to generate a more reliable pitch estimate. The resulting pitch tracks must then be assigned to frequency bins, whose boundaries are set halfway between chromatic scale degrees (on a logarithmic frequency scale). In order to define these bin boundaries, we need to have an idea of where the scale is centered, not only to know where to place the bin boundaries, but also to properly label the scale degrees (e.g. which note is the tonic) for when we analyze the PCD. In our case, we manually annotate each track we intend to analyze by tuning an oscillator while closely listening to the music. A future improvement in the process will be automatic annotation.

Once this process is complete, we have a PCD for a given chunk of music, which gives us the relative occurence, of each of the twelve possible chromatic scale degrees. Typically, the PCD is normalized so that each value represents the percentage of audio frames which contain primarily that pitch. One notable feature of real-world PCDs which come from audio signal analysis is that they are quite messy. Every chromatic pitch has some non-zero value, so even notes which are not "present" in a given raag, or a given performance or excerpt, will still show up in the histogram. This is partially a result of inaccuracies in the pitch-tracking, but also of the glides which are so common in Indian music, in which a performer may connect two notes by smoothly sliding from oneto the other. While we may hear it as a connecting gesture between the two "real" notes, the intermediate pitches are binned and counted in the histogram.

The following figure is a block diagram showing an overview of the process of PCD extraction. The inclusion of an onset detection algorithm allows the option of collecting a note-by-note histogram (that is, not weighted by duration).

PCD extraction block diagram

PCDDs incorporate a small amount of sequential information. They are histograms of bi-grams, that is, they represent the relative prevalence of all possible two-note sequences. There are of course many more possible combinations than individual notes (144), so the data is much more sparse. Calculating the PCD requires a different approach from PCD calculation; in order to count note transitions, we must first identify the notes, and cannot simply operate on a frame-level. Onset detection techniques are reasonably well-developed, but there are some difficulties which arise in the case of Indian Classical Music. The main complication of calculating PCDDs in this musical context is the occurrence of notes that are played by sliding up or down to that pitch from a previous note, without any clear onset. When pitches are histogrammed for each time-frame, as in PCDs, this poses no problem. However, in PCDDs this characteristic poses difficulties, and ideally the algorithm would not rely on explicit onsets (e.g. the percussive pluck of a string, a discrete change in pitch, etc). Currently, this problem has not been solved, and so the PCDDs entail a certain level of abstraction, as some of the values recorded in them are in actuality the bi-grams for the closest pairs of clearly articulated notes, rather than simply bi-grams of adjacent notes. This also makes PCDDs substantially more vulnerable than PCDs to variations in recording quality, accompaniment, and instrumentation.

The following figure is a block diagram showing an overview of the process of PCDD extraction. Due to the size of the PCDD vector (or matrix), we may only be interested in a few key points; a few such hypothetical points are highlighted in the figure.

PCDD extraction block diagram

Raag Database (GTraagDB)

The database was assembled from a variety of sources. The samples were chosen to include considerable diversity across several dimensions: in raag, musician, instrument, playing style, presence or absence of accompaniment, and recording quality. Commercial recordings were included along with close to twenty hours of unaccompanied raags recorded specifically for this study.

A total of thirty one raags were represented in the database, comprising a significant fraction of the commonly played corpus of ICM.

Most performances used had both a very slow, unmetered section (alap), and a faster, rhythmic section as well (bandish or gat). Nineteen musicians' playing was included; six were instrumentalists playing either the plucked string instruments sarod or sitar, or the blown instruments shenai or flute, and the remaining thirteen were vocalists, both male and female. Recordings made expressly for this project were unaccompanied by either drone or tabla, providing a clean and isolated signal, while the commercial recordings contained a full range of accompaniment, and sometimes were of a significantly degraded sound quality.