Monday, December 21, 2015

Modeling rhythms using numbers


I like to dabble a bit in generative music from time to time. While thinking about how to generate percussion patterns I was wondering about compact representations of rhythm. This blog entry documents my current approach (which, as usual, may or may not exist already, and may or may not be useful to you.)

(Note to self: this blog entry lacks some pictures for clarity.) 

Encoding rhythm in a number

Consider the following simple rock beats:

This beat has 3 voices: 
  • the upper voice represents the hi-hat
  • the middle voice represents the snare drum.
  • the lower voice represents the bass drum (kick)

Simple case: a voice has notes with equal duration (e.g. upper staff)

In the upper staff, there's an easy conversion between notes being present/absent and the bits in a binary number.

For the hi-hat, consider each measure as consisting of 8 8th notes. To each 8th note we can associate a bit in a binary number. Since on each beat an 8th note is played all bits are set to one. Therefore the hi-hat in the first measure in this case could be represented as a binary number (1111 1111), which can be written as the decimal number 255, and a resolution 2 (The resolution, 2, represents the number of bits per beat. Here it's 2 because there are 2 8th notes per beat).

The bass drum can be seen as consisting out of 4th notes. There's a kick on the first and third beat, but not on the second and fourth beat of the measure. The bass drum voice in the first measure therefore can be represented as the decimal number (1010) (in decimal this is number 10) with a resolution of 1. 

The snare drum also consists of 4th notes. There's a snare drum on beats 2 and 4, but not on beats 1 and 3 of the measure. For this reason, the snare drum can be represented as (0101) (in decimal this is number 5) with resolution of 1. 

The complete first measure therefore can be summarized as:
  • hi-hat: 255 (resolution 2)
  • snare drum: 10 (resolution: 1)
  • bass drum (kick): 5 (resolution 1)

Second case: a voice has notes with unequal duration (e.g. lower staff)

The simple notation we used before no longer suffices. The bass drum voice has a kick on the first beat, and one on the second half of the second beat. As a first idea, we can pretend that the notes are written as 8th notes that are tied together. In that case the bass drum could be almost modeled as (1101) with a resolution of 2, except that this number doesn't model at all that the first two 8th notes are tied together to form one longer kick.

To overcome this limitation, introduce a new digit 2. 2 indicates that the current note is present, and tied to the next note (whereas digit 1 indicates that the current note is present but not tied to the next one). An accurate representation for the bass drum in the lower staff therefore is (2101) with a resolution of 2. Because of the number 2, this is no longer a binary number, but it can be interpreted as a ternary number (a number in number base 3).  (2101) in number base 3 corresponds to decimal number 64.

Since one cannot tie a note to a rest, the number combination 20 doesn't make any sense and for all practical purposes can be replaced with 10. 

Without loss of generality, we can also interpret the numbers of the upper staff as numbers in number base 3. The upper staff therefore is modeled as:
  • hi-hat: (1111 1111) in number base 3, or 3280 in number base 10, (resolution: 2)
  • snare drum (0101) in number base 3, or 10 in number base 10, (resolution: 1)
  • bass drum (kick): (1010) in number base 3, or 30 in number base 10, (resolution: 1)
The lower staff is modeled as:
  • hi-hat: (1111 1111) in number base 3, or 3280 in number base 10, (resolution: 2)
  • snare drum (0101) in number base 3, or 10 in number base 10, (resolution: 1)
  • bass drum (2101) in number base 3, or 64 in number base 10, (resolution: 1)

What's the point?

Any decimal number can be rewritten in number base 3 and vice versa, so any integer represents a drum pattern voice, and every drum pattern voice can be written as a single integer. So drum pattern voices can be enumerated and constructed systematically.

Hah! I bet you can't do triplets can you?

Why not? Of course I can. Suppose you have a drum pattern voice that mixes 8th notes with 8th-based triplets. You can again consider the 8th notes as consisting of 3 tied 16th-triplet notes, and the 8th triplet notes as consisting of 2 tied16th triplet notes. The resolution is 6 (since there are 6 triplet 16ths per beat), and the pattern for 8th note triplets is (212121) (decimal: 616), whereas the pattern for 8th notes is (221221) (decimal: 700). 

How did I know I had to use a resolution of 6? A single beat has 3 triplet 8th notes, or 2 8th notes. The least common multiple of 2 and 3 is 6. Therefore I had to subdivide the beat into 6 equal parts (which corresponds to using a triplet 16th as reference length).

Is this system general enough to encode mixtures of different tuplets?

It is, but if you got to very exotic rhythms, you may end up with large resolutions and many digits. Rest assured: in popular practice most rhythms don't need very complex encodings.

Can you convert representations between different resolutions?

To some extent, yes, but not every pattern can be expressed in any resolution without loss of information. In number base 3, if you understand what we're doing here, it's rather trivial. E.g.
  • (1010) with resolution 2 corresponds to (21002100) in resolution 4. What we did here is replace every 8th note with tied 16th notes. This boils down to applying rewriting rules 1 -> 21 and 0 -> 00. This is "up"sampling the rhythmic representation, and it may be a preparation step for other transformations later on.
  • Similar we can upsample (21002100) in resolution 4 to (2221000022210000) in resolution 8. Here we replaced every 16th note with tied 32th notes.This boils down to applying rewriting rules 2 -> 22, 1-> 21, 0 ->00.
  • If you want to halve the resolution, you process the base 3 numbers by two: 
    • Take drum pattern represented by 2221000022210000 in number base 3, and group by 2: (22,21,00,00,22,21,00,00)
    • Then substitute: 22 -> 2, 21 -> 1, 00 -> 0 (This is a form of "down"-sampling without loss of information)
    • If you encounter other patterns than 22, 21 or 00 you cannot reduce the resolution without mutilating the rhythm.In that case you down-sample while losing some information (a kind of low-pass filtering).
    • If you downsample the rhythm to a lower resolution, and while doing so are forced to mutilate the rhythm, you can upsample it again, then subtract the resulting number from the original number to get an error rhythm (a kind of high-pass filtered rhythm).

Are digits 0,1,2 enough to notate any rhythm?

Yes and no. Yes: you can form any rhythm using this system. No: you cannot accurately annotate certain expressive marks (e.g. staccato, marcato, ghost notes) using this system. To add such information should be possible by introducing new digits (which themselves can encode different dimensions of information, e.g. by forming the digits by multiplying prime factors, where presence of a given prime factor indicates presence of a certain expressive mark). In that case not every conceivable number is a valid rhythm anymore and things may get hairy. Instead of absorbing the expressive marks directly in the rhythm model, they can also be added as meta-information, e.g. in the form of a second (binary) number where each bit represents presence or absence of a given expressive mark.

So how do I use this in my generative music?

It's up to you how you use the representation to create music. Here are some possibilities.
  • You can generate random integers and interpret them as drum pattern voices.
  • You can start from an integer, and use rewriting rules like the ones shown above to upsample to a higher resolution. By using rewriting rules other than the ones present in the previous section you can systematically calculate variations on the starting pattern. E.g. try 21->11, 22->11 or 22->21 to break ties, or 21 ->  10 to replace a longer duration with a shorter one.
  • Instead of using rewriting rules, you can also use systematic calculations on the decimal representations (or representation in any other number base really), and interpret the results as rhythms again. In that case the variations are stilll systematic, but most likely more unpredictable to an observer.