Algorithmic Clustering of Music

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Algorithmic Clustering of Music

Rudi Cilibrasi Paul Vitanyi Ronald de Wolf


CWI
Kruislaan 413, 1098 SJ Amsterdam, The Netherlands
fcilibrar, paulv, rdewolf g@cwi.nl

Abstract A human expert, comparing different pieces of music


with the aim to cluster likes together, will generally look for
We present a method for hierarchical music clustering, certain specific similarities. Previous attempts to automate
based on compression of strings that represent the music this process do the same. Generally speaking, they take a file
pieces. The method uses no background knowledge about containing a piece of music and extract from it various spe-
music whatsoever: it is completely general and can, with- cific numerical features, related to pitch, rhythm, harmony
out change, be used in different areas like linguistic classi- etc. One can extract these using for instance Fourier trans-
fication, literature, and genomics. Indeed, it can be used to forms [26] or wavelet transforms [15]. The feature vectors
simultaneously cluster objects from completely different do- corresponding to the various files are then classified or clus-
mains, like with like. It is based on an ideal theory of the in- tered using existing classification software, based on vari-
formation content in individual objects (Kolmogorov com- ous standard statistical pattern recognition classifiers [26],
plexity), information distance, and a universal similarity Bayesian classifiers [12], hidden Markov models [7], en-
metric. The approximation to the universal similarity metric sembles of nearest-neighbor classifiers [15] or neural net-
obtained using standard data compressors is called nor- works [12, 24]. For example, one feature would be to look
malized compression distance (NCD). Experiments using for rhythm in the sense of beats per minute. One can make
our CompLearn software tool show that the method distin- a histogram where each histogram bin corresponds to a par-
guishes between various musical genres and can even clus- ticular tempo in beats-per-minute and the associated peak
ter pieces by composer. shows how frequent and strong that particular periodicity
was over the entire piece. In [26] we see a gradual change
from a few high peaks to many low and spread-out ones
going from hip-hip, rock, jazz, to classical. One can use
1. Introduction this similarity type to try to cluster pieces in these cate-
gories. However, such a method requires specific and de-
The amount of digitized music available on the internet tailed knowledge of the problem area, since one needs to
has grown dramatically in recent years, both in the public know what features to look for.
domain and on commercial sites. Napster and its clones are
prime examples. Websites offering musical content in some Our aim is much more general. We do not look for sim-
form or other (MP3, MIDI, . . . ) need a way to organize their ilarity in specific features known to be relevant for classi-
wealth of material; they need to somehow classify their files fying music; instead we apply a general mathematical the-
according to musical genres and subgenres, putting similar ory of similarity. The aim is to capture, in a single sim-
pieces together. The purpose of such organization is to en- ilarity metric, every effective metric: effective versions of
able users to navigate to pieces of music they already know Hamming distance, Euclidean distance, edit distances [23],
and like, but also to give them advice and recommendations Lempel-Ziv distance [11], and so on. This metric should be
(If you like this, you might also like. . . ). Currently, such so general that it works in every domain: music, text, lit-
organization is mostly done manually by humans, but some erature, programs, genomes, executables, natural language
recent research has been looking into the possibilities of au- determination, equally and simultaneously. Such a metric
tomating music classification. would be able to simultaneously detect all similarities be-
tween pieces that other effective metrics can detect. Rather
 Supported in part by NWO, the NoE QUIPROCONE +IST1999 surprisingly, such a universal metric indeed exists. It was
29064, the ESF QiT Programmme, and the EU Fourth Framework developed in [17, 18, 19], based on the information dis-
BRA NeuroCOLT II Working Group EP 27150. tance of [20, 3]. Roughly speaking, two objects are deemed
close if we can significantly compress one given the in- creasing the number of objects also increases the number
formation in the other, the idea being that if two pieces of distance requirements that cannot be satisfied, and hence
are more similar, then we can more succinctly describe the distortion. The method has received considerable me-
one given the other. The underlying mathematical theory dia attention, e.g. [22, 27].
is provably universal and is based on the ideal notion of
Kolmogorov complexity, which unfortunately is not effec- Related work: After a first version of our paper ap-
tively computable. We replace the ideal but noncomputable peared on a preprint server [8], we learned of recent inde-
Kolmogorov-based version by standard compression tech- pendent experiments on MIDI files [21]. Here, the matrix
niques. Theoretical analysis of the application of real-world of distances is computed using the alternative compression-
compressors is given in [9]. (In contrast, a later and par- based approach of [1, 2] and the files are clustered on a Ko-
tially independent approach of [1, 2] for building language- honen map rather than a tree. Their first experiment takes
treeswhile citing [20, 3]is by ad hoc arguments about 17 classical piano pieces as input, and gives a clustering of
empirical Shannon entropy and Kullback-Leibler distance comparable quality to ours. Their second experiment is on
resulting in non-metric distances.) The resulting distance is a set of 48 short artificial musical pieces (stimuli), and clus-
called normalized compression distance (NCD), and re- ters these reasonably well in 8 categories.
sulted in an open source software package [10] freely avail- Another very interesting line of music research using
able on the web. The NCD metric appears to be truly uni- compression-based techniques may be found in the sur-
versal in practice: it works well on all concrete examples vey [13] and the references therein. Here the aim is not to
we tried in very different application fieldsthe first com- cluster similar musical pieces together, but to model the mu-
pletely automatic construction of the phylogeny tree based sical style of a given MIDI file. For instance, given (part of)
on whole mitochondrial genomes, [17, 18, 19], a completely a piece by Bach, one would like to predict how the piece
automatic construction of a language tree for over 50 Euro- continues, or to algorithmically generate new pieces of mu-
Asian languages [19], detecting plagiarism in student pro- sic in the same style. Techniques based on Lempel-Ziv com-
gramming assignments [25], and phylogeny of chain letters pression do a surprisingly good job at this.
[4], literature, astronomy, OCR [9]. A third related line of work is the area of query by
In this paper we apply this compression-based method humming, for which see [14] and many later papers. Here
to the hierarchical clustering of pieces of music. We use a user hums a tune, and a program is supposed to find
hierarchical clustering because visual representation of the the piece of music (in some database) that is closest to
natural-data distance matrix using multidimensional scaling the hummed tune. Clearly, any such approach will involve
gives higher distortion of the distances involved, and is less some quantitative measure of similarity. However, we are
informative, than hierarchical clustering, see [9]. We per- not aware of any compression-based similarity measure be-
form various experiments on sets of mostly classical pieces ing used in this area.
given as MIDI (Musical Instrument Digital Interface) files.
This contrasts with most earlier research, where the music 2. Algorithmic Clustering
was digitized in some wave format or other, and often re-
ceived in mp3 or other compressed format. We compute 2.1. Kolmogorov complexity
the distances between all pairs of pieces, and then build
a tree containing those pieces in a way that is consistent Each object (in the application of this paper: each piece
with those distances. First, as proof of principle, we run the of music) is coded as a string x over a finite alphabet, say
program on three artificially generated data sets, where we the binary alphabet. The integer K (x) gives the length of
know what the final answer should be. The program indeed the shortest compressed binary version from which x can be
classifies these perfectly. Secondly, we show that our pro- fully reproduced, also known as the Kolmogorov complex-
gram can distinguish between various musical genres (clas- ity of x. Shortest means the minimum taken over every
sical, jazz, rock) quite well. Thirdly, we experiment with possible decompression program, the ones that are currently
various sets of classical pieces. The results are good (in the known as well as the ones that are possible but currently un-
sense of conforming to our expectations) for small sets of known. We explicitly write only decompression because
data, but tend to be worse for large sets. This is an unavoid- we do not even require that there is also a program that com-
able consequence of the fact that while n objects with dis- presses the original file to this compressed versionif there
tances can be faithfully represented in n-dimensional space, is such a program then so much the better.
representing the relations in 2-dimensional space (multidi- So K (x) gives the length of the ultimate compressed ver-
mensional clustering), or hierarchical (ternary) tree cluster- sion, say x , of x. This can be considered as the amount of
ing, induces unavoidable distortion (like the Mercator pro- information, number of bits, contained in the string. Simi-
jection of the earth sphere to a two-dimensional map). In- larly, K (xjy) is the minimal number of bits (which we may
think of as constituting a computer program) required to re- ditive error [20]. That is, the distance d (x; y) between x and
construct x from y. In a way K (x) expresses the individual y is the number of bits of information that is not shared be-
entropy of xthe minimal number of bits to communi- tween the two strings per bit of information that could be
cate x when sender and receiver have no knowledge where maximally shared between the two strings.
x comes from. For example, to communicate Mozarts Za- It is clear that d (x; y) is symmetric, and in [19] it is
uberflote from a library of a million items requires at most shown that it is indeed a metric. Moreover, it is universal
20 bits (220  1; 000; 000), but to communicate it from in the sense that every metric expressing some similarity
scratch requires megabits. For more details on this pristine that can be computed from the objects concerned is com-
notion of individual information content we refer to [20]. prised (in the sense of minorized) by d (x; y). It is these dis-
tances that we will use, albeit in the form of a rough ap-
2.2. Distance-based classification proximation: for K (x) we simply use standard compression
software like gzip, bzip2, or PPMZ. To compute the
As mentioned, our approach is based on a new very gen- conditional version, K (xjy) we use a sophisticated theorem,
eral similarity distance, classifying the objects in clusters known as symmetry of algorithmic information in [20].
of objects that are close together according to this distance. This says
In mathematics, lots of different distances arise in all sorts K (yjx)  K (xy) K (x); (2)
so to compute the conditional complexity K (xjy) we can
of contexts, and one usually requires these to be a met-
ric, since otherwise undesirable effects may occur. A met-
ric is a distance function D(; ) that assigns a non-negative just take the difference of the unconditional complexities
distance D(a; b) to any two objects a and b, in such a way K (xy) and K (y). The theory based on real-world compres-
that D(a; b) = 0 only where a = b, D(a; b) = D(b; a) (sym- sors is developed in [9], for a new theoretical class of
metry), and D(a; b)  D(a; c) + D(c; b) (triangle inequal- compressors called normal. For our normal real-world
ity). We are interested in similarity metrics. For exam- compressors (like gzip, bzip2, PPMZ) the resulting vari-
ple, if the objects are classical music pieces then the func- ant of d (x; y) is called the normalized compression distance
tion D(a; b) = 0 if a and b are by the same composer and (NCD). It is shown to be a metric, we dont need (2), and
D(a; b) = 1 otherwise, is a similarity metric, albeit a some- we have approximate universality. (The Kolmogorov com-
what elusive one. This captures only one, but quite a signif- plexity case represents the ultimate normal compressor.)
icant, similarity aspect between music pieces. The NCD can be computed using our freely available Com-
In [19], a new theoretical approach to a wide class of pLearn Toolkit [10].
similarity metrics was proposed: their normalized infor-
mation distance is a metric, and is universal in the sense 2.3. Our quartet method
that this single metric uncovers all similarities simultane-
ously that the metrics in the class uncover separately. This The above approach allows us to compute the distance
should be understood in the sense that if two pieces of mu- between any pair of objects (any two pieces of music). We
sic are similar (that is, close) according to the particular fea- now need to cluster the objects, so that objects that are simi-
ture described by a particular metric, then they are also simi- lar according to our metric are placed close together. Multi-
lar (that is, close) in the sense of the normalized information dimensional non-hierarchical clustering turns out to distort
distance metric. This justifies calling the latter the similar- the clusters and give poor visual information. We chose hi-
ity metric. Oblivious to the problem area concerned, sim- erarchical clustering in ternary trees (each internal node has
ply using the distances according to the similarity metric, three edges) to represent the hierarchical information in the
our method fully automatically classifies the objects con- distance matrix as well as possible [9]. We need a sensitive
cerned, be they music pieces, text corpora, or genomic data. method to extract the information contained in the distance
More precisely, the approach is as follows. Each pair of matrix. For example, our experiments showed that recon-
such strings x and y is assigned a distance structing a minimum spanning tree is not sensitive enough
maxfK (xjy); K (yjx)g
and gives poor results. The quartet method in contrast is
quite sensitive. The idea is as follows: we consider every
maxfK (x); K (y)g
d (x; y) = : (1)
group of four elements from our set of n elements (in this
There is a natural interpretation to d (x; y): If, say, K (y)  case, musical pieces); there are n4 such groups. From each
K (x) then we can rewrite group u; v; w; x we construct a tree of arity 3, which implies
that the tree consists of two subtrees of two leaves each. Let
K (y) I (x : y)
d (x; y) = ; us call such a tree a quartet. There are three possibilities de-
K (y) noted (i) uvjwx, (ii) uwjvx, and (iii) uxjvw, where a vertical
where I (x : y) is the information in y about x satisfying the bar divides the two pairs of leaf nodes into two disjoint sub-
symmetry property I (x : y) = I (y : x) up to a logarithmic ad- trees.
For any given tree T and any group of four leaf la- leaf nodes and swapping them.
bels u; v; w; x, we say T is consistent with uvjwx if and 2. A subtree swap, which consists of randomly choosing
only if the path from u to v does not cross the path
two internal nodes and swapping the subtrees rooted at
from w to x. Note that exactly one of the three possi- those nodes.
ble quartets for any set of 4 labels must be consistent
for any given tree. We may think of a large tree hav- 3. A subtree transfer, whereby a randomly chosen sub-
ing many smaller quartet trees embedded within its tree (possibly a leaf) is detached and reattached in an-
structure. We formulate a possibly novel, sensitive, cost op- other place, maintaining arity invariants.
timization problem: The cost of a quartet is defined as Each of these simple mutations keeps invariant the number
the sum of the distances between each pair of neigh- of leaf and internal nodes in the tree; only the structure and
bors; that is, Cuvjwx = d (u; v) + d (w; x). The total cost CT placements change. Define a full mutation as a sequence of
of a tree T with a set N of leaves (external nodes of de- at least one but potentially many simple mutations, picked
gree 1) is defined as CT = fu;v;w;xgN fCuvjwx : T is according to the following distribution. First we pick the
consistent with uvjwxgthe sum of the costs of all its con- number k of simple mutations that we will perform with
sistent quartets. First, we generate a list of all possible probability 2 k . For each such simple mutation, we choose
quartets for all four-tuples of labels under considera- one of the three types listed above with equal probability.
tion. For each group of three possible quartets for a given Finally, for each of these simple mutations, we pick leaves
set of four labels u; v; w; x, calculate a best (minimal) cost or internal nodes, as necessary. Notice that trees which are
m(u; v; w; x) = minfCuvjwx ; Cuwjvx ; Cuxjvw g, and a worst close to the original tree (in terms of number of simple mu-
(maximal) cost M (u; v; w; x) = maxfCuvjwx ; Cuwjvx ; Cuxjvw g. tation steps in between) are examined often, while trees that
Summing all best quartets yields the best (minimal) are far away from the original tree will eventually be ex-
cost m = fu;v;w;xgN m(u; v; w; x). Conversely, sum- amined, but not very frequently. So in order to search for
ming all worst quartets yields the worst (maximal) cost a better tree, we simply apply a full mutation on T to ar-
M = fu;v;w;xgN M (u; v; w; x). For some distance matri- rive at T 0 , and then calculate S(T 0 ). If S(T 0 ) > S(T ), then
ces, these minimal and maximal values can not be attained keep T 0 as the new best tree. Otherwise, try a new differ-
by actual trees; however, the score CT of every tree T will ent tree and repeat. If S(T 0 ) ever reaches 1, then halt, out-
lie between these two values. In order to be able to com- putting the best tree. Otherwise, run until it seems no bet-
pare tree scores in a more uniform way, we now rescale the ter trees are being found in a reasonable amount of time, in
score linearly such that the worst score maps to 0, and the which case the approximation is complete.
best score maps to 1, and term this the normalized tree ben- Note that if a tree is ever found such that S(T ) = 1, then
efit score S(T ) = (M CT )=(M m). Our goal is to find we can stop because we can be certain that this tree is op-
a full tree with a maximum value of S(T ), which is to timal, as no tree could have a lower cost. In fact, this per-
say, the lowest total cost. This optimization problem re- fect tree result is achieved in our artificial tree reconstruc-
duces to problems that are known to be NP-hard [16], which tion experiment (Section 4.1) reliably in less than ten min-
means that it is infeasible in practice, but we can some- utes. For real-world data, S(T ) reaches a maximum some-
times solve it and always approximate it. Adapting current what less than 1, presumably reflecting inconsistency in the
methods in [5] results in far too computationally inten- distance matrix data fed as input to the algorithm, or indicat-
sive calculations; they run many months or years on ing a search space too large to solve exactly. On many typi-
moderate-sized problems of 30 objects. We have de- cal problems of up to 40 objects this tree-search gives a tree
signed a simple heuristic method for our problem based with S(T )  0:9 within half an hour. Progress occurs typi-
on randomization and hill-climbing. First, a random tree cally in a sigmoidal fashion towards a maximal value  1.
with 2n 2 nodes is created, consisting of n leaf nodes For large numbers of objects, tree scoring itself can be slow
(with 1 connecting edge) labeled with the names of musi- (as this takes order n4 computation steps), and the space of
cal pieces, and n 2 non-leaf or internal nodes. Each in- trees is also large, so the algorithm may slow down substan-
ternal node has exactly three connecting edges. For this tially. For larger experiments, we use a C++/Ruby imple-
tree T , we calculate the total cost of all consistent quar- mentation with MPI (Message Passing Interface, a common
tets, and invert and scale this value to find S(T ). Typically, standard used on massively parallel computers) on a clus-
a random tree will be consistent with around 13 of all quar- ter of workstations in parallel to find trees more rapidly.
tets. Now, this tree is denoted the currently best known
tree, and is used as the basis for further searching. We de-
fine a simple mutation on a tree as one of the three possible 3. Details of our implementation
transformations:
The software that we developed for the experiments of
1. A leaf swap, which consists of randomly choosing two this paper (and for later experiments reported in [9]) is
freely available [10]. We downloaded 118 separate MIDI vious section, which lays out the experiments MIDI files in
files selected from a range of classical composers, as well as tree format. Everything runs on 1.5GHz pentiums with an
some popular music. We then preprocessed these MIDI files insignificant memory footprint.
to make them more uniform. This is done to keep the exper-
iments honest: We want to analyze just the musical com-
ponent, not the title indicator in the MIDI file, nor the se- 4. Results
quencers name, or author/composers name, nor sequenc-
ing program used, nor any of the many other non-musical 4.1. Three controlled experiments
data that can be incorporated in the MIDI file. We strip off
this information from the MIDI file to avoid detecting sim- With the natural data sets of music pieces that we use,
ilarity between files for non-musical reasons, for example, one may have the preconception (or prejudice) that mu-
like being prepared by the same source. sic by Bach should be clustered together, music by Chopin
should be clustered together, and so should music by rock
The preprocessor extracts just MIDI Note-On and Note-
stars. However, the preprocessed music files of a piece by
Off events. These events were then converted to a player-
Bach and a piece by Chopin, or the Beatles, may resem-
piano style representation, with time quantized in 0:05 sec-
ble one another more than two different pieces by Bach
ond intervals. All instrument indicators, MIDI Control sig-
by accident or indeed by design and copying. Thus, natu-
nals, and tempo variations were ignored. For each track in
ral data sets may have ambiguous, conflicting, or counter-
the MIDI file, we calculate two quantities: an average vol-
intuitive outcomes. In other words, the experiments on ac-
ume and a modal note (modal is used here in a statistical
tual pieces have the drawback of not having one clear cor-
sense, not in a musical sense). The average volume is cal-
rect answer that can function as a benchmark for assess-
culated by averaging the volume (MIDI Note velocity) of
ing our experimental outcomes. Before describing the ex-
all notes in the track. The modal note is defined to be the
periments we did with MIDI files of actual music, we dis-
note pitch that sounds most often in that track. If this is not
cuss three experiments that show that our program indeed
unique, then the lowest such note is chosen. The modal note
does what it is supposed to doat least in artificial situa-
is used as a key-invariant reference point from which to rep-
tions where we know in advance what the correct answer
resent all notes. It is denoted by 0, higher notes are denoted
is. The similarity machine consists of two parts: (i) extract-
by positive numbers, and lower notes are denoted by neg-
ing a distance matrix from the data, and (ii) constructing a
ative numbers. A value of 1 indicates a half-step above the
tree from the distance matrix using our novel quartet-based
modal note, and a value of 2 indicates a whole-step be-
heuristic.
low the modal note. The modal note is written as the first
byte of each track. For each track, we iterate through each Testing the quartet-based tree construction: We first
0:05-sec time sample in order, outputting a single signed test whether the quartet-based tree construction heuristic is
8-bit value for each currently sounding note (ordered from trustworthy: We generated a random ternary tree T with 18
lowest to highest). Two special values are reserved to repre- leaves, and derived a distance metric from it by defining the
sent the end of a time step and the end of a track. The tracks distance between two nodes as follows: Given the length
are sorted according to decreasing average volume, and then of the path from a to b, in an integer number of edges, as
output in succession. The preprocessing phase does not sig- L(a; b), let
nificantly alter the musical content of the MIDI file: the pre- L(a; b) + 1
d (a ; b ) = ;
processed file sounds almost the same as the original. 18
These preprocessed MIDI files are then used as input except when a = b, in which case d (a; b) = 0. It is easy to
to the compression stage for distance matrix calculation verify that this simple formula always gives a number be-
and subsequent tree search. We chose to use the compres- tween 0 and 1, and is monotonic with path length. Given
sion program bzip2 for our experiments. Unlike the well only the 18  18 matrix of these normalized distances, our
known dictionary-based Lempel-Zip compressors, bzip2 quartet method exactly reconstructed the original tree with
transforms a file into a data-dependent permutation of itself S(T ) = 1.
by using the Burrows Wheeler Transform [6]. Very briefly: Testing the similarity machine on artificial data:
The BWT operates on the file as well as on all of its rota- Given that the tree reconstruction method is accurate on
tions; the algorithm sorts this block of rotations and uses a clean consistent data, we tried whether the full proce-
move-to-front encoding scheme to focus the redundancy in dure works in an acceptable manner when we know what
the file into simple statistical biases that can be used by an the outcome should be like. For reasons of space we omit
entropy coder in the output stage without context. the details and resulting tree, but note that it had cluster-
The resulting matrix of pairwise distances is then fed into ing occur exactly as we would expect. The S(T ) score is
our tree construction program, described in detail in the pre- 0.905.
Testing the similarity machine on natural data: We The discrimination between the 3 genres is good but not
test gross classification of files based on markedly different perfect. The upper-right branch of the tree contains 10 of
file types. Here, we chose several files: (i) Four mitochon- the 12 jazz pieces, but also Chopins Prelude no. 15 and a
drial gene sequences, from a black bear, polar bear, fox, and Bach Prelude. The two other jazz pieces, Miles Daviss So
rat; (ii) Four excerpts from the novel The Zeppelins Passen- what and John Coltranes Giant steps are placed else-
ger by E. Phillips Oppenheim; (iii) Four MIDI files without where in the tree, perhaps according to some kinship that
further processing; two from Jimi Hendrix and two move- now escapes us but can be identified by closer studying of
ments from Debussys Suite bergamasque; (iv) Two Linux the objects concerned. Of the rock pieces, 9 are placed close
x86 ELF executables (the cp and rm commands); and (v) together in the lower-left branch, while Hendrixs Voodoo
Two compiled Java class files. As expected, the program chile, Rushs Yyz, and Dire Straits Money for noth-
correctly classifies each of the different types of files to- ing are further away. In the case of the Hendrix piece this
gether with like near like. The result is reported in Figure 1 may be explained by the fact that it hovers between the
with S(T ) equal to 0.984. This experiment shows the power jazz and rock genres. Most of the classical pieces are in the
and universality of the method: no features of any specific lower-right and middle part of the tree. Surprisingly, 2 of the
domain of application is used. 4 Bach pieces are placed elsewhere. It is not clear why this
happens and may be considered an error of our program,
since we perceive the 4 Bach pieces to be very close, both
MusicBergA
structurally and melodically. However, Bachs is a seminal
MusicBergB
MusicHendrixB ELFExecutableB music and has been copied and cannibalized in all kinds of
recognizable or hidden manners; closer scrutiny could re-
MusicHendrixA ELFExecutableA
veal likenesses in its present company that are not now ap-
parent to us. In effect our similarity engine aims at the ideal
of a perfect data mining process, discovering unknown fea-
TextD tures in which the data can be similar.
GenesFoxC

TextC 4.3. Classical piano music


GenesRatD

We then tested our method on three sets, of increasing


size, of classical piano music. The smallest set encompasses
GenesPolarBearB JavaClassA
TextA
the 4 movements from Debussys Suite bergamasque, 4
GenesBlackBearA JavaClassB movements of book 2 of Bachs Wohltemperierte Klavier,
TextB
and 4 preludes from Chopins opus 28. As one can see in
Figure 3, our program does a pretty good job at cluster-
Figure 1. Classification of different file types
ing these pieces. The S(T ) score is also high: 0.958. The
4 Debussy movements form one cluster, as do the 4 Bach
pieces. The only imperfection in the tree, judged by what
one would intuitively expect, is that Chopins Prelude no. 15
lies a bit closer to Bach than to the other 3 Chopin pieces.
4.2. Music genre classification This Prelude no 15, in fact, consistently forms an odd-one-
out in our other experiments as well. There is some musical
The limited available space doesnt allow many pictures, truth to this, as no. 15 may be perceived as the most eccen-
so we briefly discuss the results. For the full paper, see [8]. tric among the 24 Preludes of Chopins opus 28.
Before testing whether our program can see the distinc- We further tested the method with a medium-sized set
tions between various classical composers, we first show that added 20 pieces to the small set, which gave an S(T )
that it can distinguish between three broader musical gen- score slightly lower than in the small set experiment: 0.895;
res: classical music, rock, and jazz. This should be easier a large set of 60 pieces where the S(T ) score dropped further
than making distinctions within classical music. For the from that of the medium-sized set to 0.844; more compli-
genre-experiment we used 12 classical pieces from Bach, cated music, namely 34 symphonic pieces, which resulted
Chopin, and Debussy, 12 jazz pieces from Miles Davis, in an S(T ) score of 0.860. In all cases the S(T ) score is re-
John Coltrane and the like, and 12 rock pieces from The liable with respect to what our intuition tells us. Note that
Beatles, The Police, etc. The output tree (Figure 2) has S(T ) a lower S(T ) score only indicates that the corresponding
score 0.858. All musical pieces used are listed in the tables matrix of distances is not faithfully represented by the tree.
in the full paper. Whether the distance matrix itself satisfies our preconceived
ParkYardbird MonkRoundM

BachWTK2P1 ChopPrel15
GilleTunisia
GershSumm ColtrLazybird
Miles7steps

MilesMilesto
MilesSowhat

ChopPrel24
ColtrBlueTr

ChopPrel22

MilesSolar
ChopPrel1

ColtrImpres

HendrixVoodoo

RushYyz BachWTK2P2

BachWTK2F1

BachWTK2F2

PoliceMess ColtrGiantStp

DireStMoney
MetalOne

DebusBerg1

BeatlMich
ClaptonCoca
DebusBerg4
BeatlEleanor
LedZStairw ClaptonLayla

HendrixJoe DebusBerg2
PoliceBreath
DebusBerg3

Figure 2. Output for the 36 pieces from 3 genres

ideas about musical similarity is a separate issue, which we the pieces in a tree, in accordance with the computed dis-
do not address here. tances. We want to stress again that our method does not
rely on any music-theoretical knowledge or analysis, but
5. Summary and conclusion only on general-purpose compression techniques. The ver-
satility and general-purpose nature of our method is also ex-
In this paper we reported on experiments that cluster sets emplified by the range of later experiments reported in the
of MIDI files by means of compression. The intuitive idea subsequent paper [9].
is that two files are closer to the extent that one can be com-
pressed better given the other. Thus the notion of com- References
pression induces a similarity metric on strings in general
and MIDI files in particular. Our method derives from the [1] D. Benedetto, E. Caglioti, and V. Loreto. Language trees and
notion of Kolmogorov complexity, which describes the ul- zipping. Physical Review Letters, 88:4, 048702, 2002.
timate limits of compression. As a theoretical approach this [2] Ph. Ball. Algorithm makes tongue tree. Nature, January 22,
is provably universal and optimal. The actual implemen- 2002.
tation, however, is by necessity non-optimal because the [3] C.H. Bennett, P. Gacs, M. Li, P.M.B. Vitanyi, and W. Zurek.
uncomputable Kolmogorov complexity has to be replaced Information distance. IEEE Transactions on Information
by some practical compressor (we used bzip2 here, though Theory, 44(4):14071423, 1998.
others give similar results). We described various experi- [4] C.H. Bennett, M. Li, B. Ma. Chain letters and evolutionary
ments where we first computed the matrix of pairwise dis- histories. Scientific American, 7681, June 2003.
tances between the various MIDI files involved, and then [5] D. Bryant, V. Berry, P. Kearney, M. Li, T. Jiang, T. Wareham
used a new heuristic tree construction algorithm to lay out and H. Zhang. A practical algorithm for recovering the best
DebusBerg1
DebusBerg3
DebusBerg4

DebusBerg2

ChopPrel15

BachWTK2F2
ChopPrel22

ChopPrel1 BachWTK2F1
ChopPrel24

BachWTK2P2

BachWTK2P1

Figure 3. Output for the 12-piece set

supported edges of an evolutionary tree. Proc. 11th ACM- [16] T. Jiang, P. Kearney, and M. Li. A polynomial time ap-
SIAM Symposium on Discrete Algorithms, 287296, 2000. proximation scheme for inferring evolutionary trees from
[6] M. Burrows and D.J. Wheeler. A block-sorting lossless data quartet topologies and its application. SIAM J. Computing,
compression algorithm. Technical report 124, Digital Equip- 30(6):19421961, 2001.
ment Corporation, Palo Alto, California, 1994. [17] M. Li, J.H. Badger, X. Chen, S. Kwong, P. Kearney, and
[7] W. Chai and B. Vercoe. Folk music classification using hid- H. Zhang. An information-based sequence distance and
den Markov models. Proc. of International Conference on its application to whole mitochondrial genome phylogeny.
Artificial Intelligence, 2001. Bioinformatics, 17(2):149154, 2001.
[8] R. Cilibrasi, P. Vitanyi, and R. de Wolf. Algorithmic clus- [18] M. Li and P.M.B. Vitanyi. Algorithmic complexity. In Inter-
tering of music. http://arxiv.org/abs/cs.SD/0303025. Differ- national Encyclopedia of the Social & Behavioral Sciences,
ent and extended version accepted for publication in Com- pp. 376382, N.J. Smelser and P.B. Baltes, Eds., Pergamon,
puter Music Journal. Oxford, 2001/2002.
[19] M. Li, X. Chen, X. Li, B. Ma, P. Vitanyi. The similarity
[9] R. Cilibrasi and P. Vitanyi. Clustering by compression.
metric. Proc. 14th ACM-SIAM Symposium on Discrete Al-
http://arxiv.org/abs/cs.CV/0312044
gorithms, pp. 863872, 2003.
[10] CompLearn Toolkit :: Machine Learning Via Compression, [20] M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov
written by R. Cilibrasi, http://complearn.sourceforge.net/ Complexity and its Applications. Springer-Verlag, New York,
[11] G. Cormode, M. Paterson, S. Sahinalp, and U. Vishkin. Com- 2nd Edition, 1997.
munication complexity of document exchange. Proc. 11th [21] A. Londei, V. Loreto, and M. O. Belardinelli. Musical style
ACM-SIAM Symposium on Discrete Algorithms, pp. 197 and authorship categorization by informative compressors.
206, 2000. Proc. 5th Triennial ESCOM Conference, pp. 200203, 2003.
[12] R. Dannenberg, B. Thom, and D. Watson. A machine learn- [22] H. Muir. Software to unzip identity of unknown composers.
ing approach to musical style recognition. Proc. Interna- New Scientist. April 12, 2003.
tional Computer Music Conference, pp. 344347, 1997. [23] K. Orpen and D. Huron. Measurement of similarity in mu-
[13] S. Dubnov, G. Assayag, O. Lartillot, and G. Bejerano. Using sic: A quantitative approach for non-parametric representa-
machine-learning methods for musical style modeling. Com- tions. Computers in Music Research 4, 144, 1992.
puter 36(10):7380, 2003. IEEE. [24] P. Scott. Music classification using neu-
[14] A. Ghias, J. Logan, D. Chamberlin, and B.C. Smith. Query ral networks, 2001. http://www.stanford.edu/
by humming: Musical information retrieval in an audio class/ee373a/musicclassification.pdf
database. Proc. of ACM Multimedia Conference, pp. 231 [25] Shared Information Distance or Software Integrity Detec-
236, 1995. tion, Computer Science, University of California, Santa Bar-
[15] M. Grimaldi, A. Kokaram, and P. Cunningham. Clas- bara, http://dna.cs.ucsb.edu/SID/
sifying music by genre using the wavelet packet [26] G. Tzanetakis and P. Cook. Music genre classification of au-
transform and a round-robin ensemble. Tech- dio signals. IEEE Transactions on Speech and Audio Pro-
nical report TCD-CS-2002-64, Trinity College cessing, 10(5):293302, 2002.
Dublin, 2002. http://www.cs.tcd.ie/publications/tech- [27] K. Patch. Software sorts tunes. Technology Research News,
reports/reports.02/TCD-CS-2002-64.pdf April 23/30, 2003.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy