Algorithmic Clustering of Music
Algorithmic Clustering of Music
Algorithmic Clustering of Music
BachWTK2P1 ChopPrel15
GilleTunisia
GershSumm ColtrLazybird
Miles7steps
MilesMilesto
MilesSowhat
ChopPrel24
ColtrBlueTr
ChopPrel22
MilesSolar
ChopPrel1
ColtrImpres
HendrixVoodoo
RushYyz BachWTK2P2
BachWTK2F1
BachWTK2F2
PoliceMess ColtrGiantStp
DireStMoney
MetalOne
DebusBerg1
BeatlMich
ClaptonCoca
DebusBerg4
BeatlEleanor
LedZStairw ClaptonLayla
HendrixJoe DebusBerg2
PoliceBreath
DebusBerg3
ideas about musical similarity is a separate issue, which we the pieces in a tree, in accordance with the computed dis-
do not address here. tances. We want to stress again that our method does not
rely on any music-theoretical knowledge or analysis, but
5. Summary and conclusion only on general-purpose compression techniques. The ver-
satility and general-purpose nature of our method is also ex-
In this paper we reported on experiments that cluster sets emplified by the range of later experiments reported in the
of MIDI files by means of compression. The intuitive idea subsequent paper [9].
is that two files are closer to the extent that one can be com-
pressed better given the other. Thus the notion of com- References
pression induces a similarity metric on strings in general
and MIDI files in particular. Our method derives from the [1] D. Benedetto, E. Caglioti, and V. Loreto. Language trees and
notion of Kolmogorov complexity, which describes the ul- zipping. Physical Review Letters, 88:4, 048702, 2002.
timate limits of compression. As a theoretical approach this [2] Ph. Ball. Algorithm makes tongue tree. Nature, January 22,
is provably universal and optimal. The actual implemen- 2002.
tation, however, is by necessity non-optimal because the [3] C.H. Bennett, P. Gacs, M. Li, P.M.B. Vitanyi, and W. Zurek.
uncomputable Kolmogorov complexity has to be replaced Information distance. IEEE Transactions on Information
by some practical compressor (we used bzip2 here, though Theory, 44(4):14071423, 1998.
others give similar results). We described various experi- [4] C.H. Bennett, M. Li, B. Ma. Chain letters and evolutionary
ments where we first computed the matrix of pairwise dis- histories. Scientific American, 7681, June 2003.
tances between the various MIDI files involved, and then [5] D. Bryant, V. Berry, P. Kearney, M. Li, T. Jiang, T. Wareham
used a new heuristic tree construction algorithm to lay out and H. Zhang. A practical algorithm for recovering the best
DebusBerg1
DebusBerg3
DebusBerg4
DebusBerg2
ChopPrel15
BachWTK2F2
ChopPrel22
ChopPrel1 BachWTK2F1
ChopPrel24
BachWTK2P2
BachWTK2P1
supported edges of an evolutionary tree. Proc. 11th ACM- [16] T. Jiang, P. Kearney, and M. Li. A polynomial time ap-
SIAM Symposium on Discrete Algorithms, 287296, 2000. proximation scheme for inferring evolutionary trees from
[6] M. Burrows and D.J. Wheeler. A block-sorting lossless data quartet topologies and its application. SIAM J. Computing,
compression algorithm. Technical report 124, Digital Equip- 30(6):19421961, 2001.
ment Corporation, Palo Alto, California, 1994. [17] M. Li, J.H. Badger, X. Chen, S. Kwong, P. Kearney, and
[7] W. Chai and B. Vercoe. Folk music classification using hid- H. Zhang. An information-based sequence distance and
den Markov models. Proc. of International Conference on its application to whole mitochondrial genome phylogeny.
Artificial Intelligence, 2001. Bioinformatics, 17(2):149154, 2001.
[8] R. Cilibrasi, P. Vitanyi, and R. de Wolf. Algorithmic clus- [18] M. Li and P.M.B. Vitanyi. Algorithmic complexity. In Inter-
tering of music. http://arxiv.org/abs/cs.SD/0303025. Differ- national Encyclopedia of the Social & Behavioral Sciences,
ent and extended version accepted for publication in Com- pp. 376382, N.J. Smelser and P.B. Baltes, Eds., Pergamon,
puter Music Journal. Oxford, 2001/2002.
[19] M. Li, X. Chen, X. Li, B. Ma, P. Vitanyi. The similarity
[9] R. Cilibrasi and P. Vitanyi. Clustering by compression.
metric. Proc. 14th ACM-SIAM Symposium on Discrete Al-
http://arxiv.org/abs/cs.CV/0312044
gorithms, pp. 863872, 2003.
[10] CompLearn Toolkit :: Machine Learning Via Compression, [20] M. Li and P.M.B. Vitanyi. An Introduction to Kolmogorov
written by R. Cilibrasi, http://complearn.sourceforge.net/ Complexity and its Applications. Springer-Verlag, New York,
[11] G. Cormode, M. Paterson, S. Sahinalp, and U. Vishkin. Com- 2nd Edition, 1997.
munication complexity of document exchange. Proc. 11th [21] A. Londei, V. Loreto, and M. O. Belardinelli. Musical style
ACM-SIAM Symposium on Discrete Algorithms, pp. 197 and authorship categorization by informative compressors.
206, 2000. Proc. 5th Triennial ESCOM Conference, pp. 200203, 2003.
[12] R. Dannenberg, B. Thom, and D. Watson. A machine learn- [22] H. Muir. Software to unzip identity of unknown composers.
ing approach to musical style recognition. Proc. Interna- New Scientist. April 12, 2003.
tional Computer Music Conference, pp. 344347, 1997. [23] K. Orpen and D. Huron. Measurement of similarity in mu-
[13] S. Dubnov, G. Assayag, O. Lartillot, and G. Bejerano. Using sic: A quantitative approach for non-parametric representa-
machine-learning methods for musical style modeling. Com- tions. Computers in Music Research 4, 144, 1992.
puter 36(10):7380, 2003. IEEE. [24] P. Scott. Music classification using neu-
[14] A. Ghias, J. Logan, D. Chamberlin, and B.C. Smith. Query ral networks, 2001. http://www.stanford.edu/
by humming: Musical information retrieval in an audio class/ee373a/musicclassification.pdf
database. Proc. of ACM Multimedia Conference, pp. 231 [25] Shared Information Distance or Software Integrity Detec-
236, 1995. tion, Computer Science, University of California, Santa Bar-
[15] M. Grimaldi, A. Kokaram, and P. Cunningham. Clas- bara, http://dna.cs.ucsb.edu/SID/
sifying music by genre using the wavelet packet [26] G. Tzanetakis and P. Cook. Music genre classification of au-
transform and a round-robin ensemble. Tech- dio signals. IEEE Transactions on Speech and Audio Pro-
nical report TCD-CS-2002-64, Trinity College cessing, 10(5):293302, 2002.
Dublin, 2002. http://www.cs.tcd.ie/publications/tech- [27] K. Patch. Software sorts tunes. Technology Research News,
reports/reports.02/TCD-CS-2002-64.pdf April 23/30, 2003.