Please Do Not Remove This Page: Thank You For Do Wnloading This Docum Ment From The Rmit R Research R Repository
Please Do Not Remove This Page: Thank You For Do Wnloading This Docum Ment From The Rmit R Research R Repository
Please Do Not Remove This Page: Thank You For Do Wnloading This Docum Ment From The Rmit R Research R Repository
7KH500,75HVHDUFFK5HSRVLWR HQDFFHVVG
RU\LVDQRSH GDWDEDVHVK
KRZFDVLQJWWKHUHVHDUF
FK
RXWSXWVVRI50,78QLYHUVLW\UHV
VHDUFKHUV
50,75 HSRVLWRU\KWWSUHVHDUFKEDQNUPLWHGXDX
5HVHDUFK5H
Citatio
on:
https://researchbank.rmit.edu.au/view/rmit:30939
Version
n: Published Version
Copyright Statem
ment: © 2004 Universitat Pompeu Fabra
Link to
o Published
d Version:
http://ismir2004.ismir.net/proceedings/p042-page-224-paper225.pdf
PLEASE DO NOT REMOVE THIS PAGE
EXPLORING MICROTONAL MATCHING
Permission to make digital or hard copies of all or part of this work for
For a music representation to accommodate alternative
personal or classroom use is granted without fee provided that copies tunings and microtonalism, it needs to provide the facil-
are not made or distributed for profit or commercial advantage and that ity for the pitches of the tuning system to be defined. We
copies bear this notice and the full citation on the first page. assume here that there is consistency in the tuning across
c 2004 Universitat Pompeu Fabra.
a piece of music or at least a section of the music, so that
one tuning definition can be applied to the music.
Below we discuss several methods that can be applied while notes that sound sequentially are called sequential
to microtonal representation, and we follow this with the events [19].
simple approach used for our experimental data.
XML XML is a rapidly developing technology. There
MIDI MIDI (Musical Instrument Digital Interface), in is an abundance of tools supporting the meta-language,
terms of music format, is a representation that supports such as authoring tools, development libraries, viewers,
polyphonic music and multitrack recording and playback. and converters. It can also be extended by users to fit
It encodes notes as integers. their own needs. Data definition is made possible by the
The Standard MIDI File (SMF) format and General use of a Document Type Description (DTD) and an XML
MIDI (GM) defines how MIDI data should be stored. It Schema [7]. As XML documents can contain semistruc-
has no support for non-twelve-tone systems. According to tured data [7] they can be used to store music. Using the
Correia and Selfridge-Field, Scholz has proposed an ex- method suggested by Roland [27], notes are described as
tension to MIDI concerning tuning systems [4]. However, shown in the example below.
it is not a strict standard. MIDI Manufacturer Associa-
<note pitch="C4" dur="1" />
tion has published MIDI TUNING Extensions [22]. This is
however still limited to twelve-tone systems. What users <note pitch="D4" dur="0.5" />
<note pitch="E4" dur="0.5" />
can do is alter the tuning of each pitch. For example, we
can detune C by −20 cents, C♯ by 15 cents, D by 50 cents, Adding microtonal information would be a simple mat-
and so on. With standard MIDI this can be achieved by ter of defining further XML tags that contain tuning infor-
pitch bend events. mation for the scale.
XML is effective, but inefficient in its raw state. How-
ESAC ESAC has support for non-twelve-tone systems. ever, despite its verbose nature, XML generally com-
For example, Schaffrath defined the tuning of heptatonic presses very well. Its efficiency can be increased further
scales to allow the encoding of music from many cul- by indexing.
tures [28].
In ESAC, a pitch is represented as a number. This is MTRI The Micro-Tonal Representation for Information
applicable to scales originating from most cultures. En- Retrieval (MTRI) was designed as a music representation
coding is invariant with respect to scale or tuning systems. method for our microtonal experiments. It consists of
The tonic (first note) is always represented by “1”. The the essential elements required for melody, harmony and
note encoding is similar to solfege [29]. Note durations rhythm representation, but ignores other musical aspects
are stored as relative durations. Besides pitches and note such as loudness. However, the representation is sufficient
durations, the title, source, and social function of a tune to capture the recognisable elements of a piece of music.
are also stored [29]. For the sake of brevity we mainly describe the tuning sys-
With regard to melody retrieval, matching on an tem representation here, as that is the principle concern of
ESAC-encoded tune can involve pitch components only, this piece of research.
note duration component only, or a combination of both. To encode a tune in MTRI, two files are used: MTP
ESAC has been used to identify that the first five notes (MTRI pitch specification file) and MTS (MTRI score
of the clarinet part of Mozart’s second trio from Clar- file). An MTP stores information about pitch frequencies,
inet Quintet resembles two German folksongs, “Hoert Ihr while an MTS stores information about note events.
Herrn und lasst euch sagen” and “Trauer, Trauer, über In an MTP, the parameter N is used to describe the
Trauer” [29]. number of notes in the tuning system. As an example,
Despite being effective for melody retrieval, this rep- for the equal-tempered twelve-tone system, N = 12 (see
resentation has a basic limitation: it has no support for Figure 1). A pitch is stored in one line. When storing pitch
polyphonic music. names that have the same frequency, all the pitch names
must appear before the frequency.
Humdrum Huron has created a set of utilities called An MTS begins with the directive :Use, which spec-
Humdrum, which is useful for facilitating the posing and ifies the MTP to be associated with the tune. For modern
answering of music research questions [19]. Humdrum by compositions with tuning systems that change mid-work,
itself is not useful for representing any music, because it it would be simple to extend the representation by allow-
is a syntax [19]. Humdrum is not limited to any particular ing the use of the directive wherever a new tuning system
tuning system. Users may define their own representa- is required. The note names are not limited to the letters
tions. For twelve-tone-system representation, a Humdrum A to G, so microtonal melodies that divide the octave into
representation called kern has been suggested. more than 12 notes are also able to be stored.
One of the strengths of Humdrum is its support for
polyphonic music which is one of its design consider- 4. RETRIEVAL
ations. It can represent “sequential and/or concurrent
time-dependent discrete symbolic data” [19]. Notes that An information retrieval system is only useful when it can
sound at the same time are called concurrent attributes, answer queries effectively. Information retrieval systems
same piece of music, melody standardisation is used [35].
N = 12 Here, a pitch is not represented exactly as it sounds. This
C B+ 261.63 is to support approximate matching. This is analogous
C+ D- 277.18 to a technique in text retrieval systems called case folding
D 293.66
D+ E- 311.13 (converting all characters to the same case [41]). There are
E F- 329.63 many possble melody standardisations, but we will only
F E+ 349.23
F+ G- 369.99 cover the one relevant to this experiment exact microtonal
G 392.00 interval. We also consider incorporating note duration in-
G+ A- 415.30
A 440.00 formation for matching.
A+ B- 466.16 We use approximate string matching [24]. The algo-
B C- 493.88
rithm to be used is edit distance, also known as Leven-
shtein distance. We do not use the simplest form of this
Figure 1. MTP for equal-tempered twelve-tone system. algorithm, as described by Crochemore and Rytter [5].
However, we use its variation called alphabet-weight edit
distance [12]. Matching is done in conjunction with con-
that support ranked queries measure how similar a query tour, directed modulo, and exact microtonal interval stan-
is to items in the database according to some meaning of dardisations. This is discussed further in Section 4.4.
relevance [41]. Most music retrieval systems including the
one we report here support ranked queries. In our case, the 4.1. Pitch Standardisation
queries are tunes.
Various matching techniques have been developed Uitdenbogerd’s doctoral thesis [34] discusses various re-
to anticipate query vagueness. Dynamic programming trieval standardisations some of which are the basis for our
was examined by Uitdenbogerd and Zobel [35], a tech- microtone-enabled techniques. Besides contour and ex-
nique suggested by Mongeau and Sankoff [23]. The act interval standardisations, the thesis also focuses on di-
first published use of n-grams for melody matching was rected modulo-12 for the underlying experiments. The di-
by Downie [10]. The concept was further examined rected modulo-12 standardisation represents each note as
by Uitdenbogerd [35], Pickens [25] and Doraisamy and a numeric value which is the interval in semitones (scaled
Rüger [9]. The comparison of both approaches has been to a maximum of one octave) relative to the previous note.
shown by Uitdenbogerd and Zobel [36], which indicated The value is expressed as:
that n-grams can be used as a fast alternative to dynamic
0 ;d = 0
programming approaches to melody matching without ρ12 ≡ (1)
d(1 + ((I − 1) mod 12)) ; d 6= 0
significant loss of effectiveness. An alternative approach
is the indexing of notes and applying a look-up of each where I is the interval between a note and its previous note
note in multiple musical keys, with the Chinese remain- (absolute value) and d is 1 if the previous note is lower
der theorem for transposition-invariant retrieval [3]. Re- than the current note, −1 if higher, and 0 if otherwise [33,
cent work by Birmingham, Meek, O’Malley, Pardo, and 34]. This is however limited to twelve-tone systems. For
Shifrin [1] uses stochastic models. non-twelve-tone systems, the formula can be generalised
Dannenberg, Birmingham, Tzanetakis, Meek, Hu, and so that a note is expressed as:
Pardo [6] also used HMM (Hidden Markov Models) along
0 ;d = 0
with dynamic programming in conjunction with directed ρt ≡ (2)
d(1 + ((I − 1) mod t)) ; d 6= 0
modulo-12 standardisation [36] and Inter Onset Interval
ratio values. They also tested melodic contour match- where t is the number of tones in the tuning system [33].
ing. Effectiveness was reported as MRR (Mean Recip- This is may only work well for equal-tempered tuning sys-
rocal Rank), the percentage of answers ranked as the first tems and a special scoring technique may need to be de-
answer, in the top two, and in the top three. Closely related veloped for matching two tunes having different number
to this work are those by Meek and Birmingham [21] and of tones in their tuning systems.
Shifrin and Birmingham [31], both of which use HMM Exact Microtonal Interval standardisation is an exten-
for searching and MRR to report its effectiveness. sion of exact interval standardisation as described in Uit-
Kageyama, Mochizuki, and Takashima [20] used dy- denbogerd and Zobel [36]. In the exact interval stan-
namic programming for their query-by-humming retrieval dardisation, a note is represented using the number of
system. Their system also made use of note duration in- semitones between itself and its previous note [36]. In
formation for melody matching. The query melody and contrast, for microtone-enabled matching purposes, we
the melodies in the database were transposed for match- express intervals in cents. As an example, “Melbourne
ing. Note duration was used as the weight for matching Still Shines” (Figure 2) is represented as “700 400 100
score. The effectiveness is reported using the number of -500 -500 200 300 -200 -100 -200” (see Ta-
melodic samples (out of 100) retrieved as the first answer ble 1). Two notes that differ are perceived as “fairly simi-
and in the top ten. lar” when the frequency difference is less than just notice-
To support comparison of different renditions of the able difference (JNDF) [8, 26]. JNDF is not a linear mea-
sure. At 100 Hz, JNDF is 3 Hz, while at 2 000 Hz, JNDF
4.3. Polyphonic Music
Table 1. Exact microtonal interval standardisation exam-
ple. Most music is polyphonic in the sense that more than one
note sounds simultaneously. This adds extra complexity to
Transition ιc the matching process. In our work we treat each track or
C4-G4 700 part of a polyphonic piece as a separate sequence of notes
G4-B4 400 for matching. For example, if a piece consisted of violin,
B4-C5 100
cello and piano parts, the query would be matched against
C5-G4 −500
G4-D4 −500
each of these separately. This results in a similarity score
D4-E4 200 for each part. The best one is chosen as the representative
E4-G4 300 score for the piece. Matching against all tracks in this
G4-F4 −200 manner was shown to be an effective approach in earlier
F4-E4 −100 work [36]. Where there is polyphony within a part no
E4-D4 −200 notes are discarded, and the sequence as defined in the
original file is retained. While this may be an issue for
matching real queries it does not affect the experiments
reported here as they involve known-item searches and the
query and potential answers are processed identically.
than 1.00 and P10 less than 100.00%, MHFM is also ex-
80.0
8. REFERENCES
60.0
Note duration ignored
Note duration incorporated [1] W. Birmingham, C. Meek, K. O’Malley, B. Pardo, and
40.0
J. Shifrin. Music information retrieval systems. Dr. Dobb’s
20.0 Journal, pages 50–53, September 2003.
0.0
0_5 0_5-2 1 1-2 2 2-2 [2] D. Byrd, J. S. Downie, T. Crawford, W. B. Croft, and
Scoring scheme C. Nevill-Manning, editors. International Symposium on
Music Information Retrieval, volume 1, Plymouth, Mas-
sachusetts, USA, October 2000.
Figure 6. MHFM’s for exact microtonal interval stan-
[3] M. Clausen, R. Engelbrecht, D. Meyer, and J. Schmitz.
dardisation. The bar showing the incorporation of note
PROMS: A web-based tool for searching in polyphonic
duration is the best result (lowest MHFM) from three du- music. In Byrd et al. [2].
ration contour scoring schemes.
[4] E. Correia, Jr, E. Selfridge-Field, et al. Glossary. In
Selfridge-Field [30], pages 581–610.
with greater magnitude of insertion/deletion penalty to [5] M. Crochemore and W. Rytter. Text Algorithms. Oxford
slightly improve retrieval effectiveness. However, the im- University Press, New York, USA, 1994.
provement is insignificant compared to the extra process- [6] R. B. Dannenberg, W. P. Birmingham, G. Tzanetakis,
ing required. C. Meek, N. Hu, and B. Pardo. The MUSART testbed for
query-by-humming evaluation. In Hoos and Bainbridge
[17], pages 41–47.
7. CONCLUSIONS
[7] C. J. Date. An Introduction to Database Systems. Addison-
Our results demonstrate the applicability of microtone- Wesley, Boston, USA, eighth edition, 2003.
aware matching techniques to music of various tuning [8] C. Dodge and T. A. Jerse. Computer Music: Synthe-
systems. Microtone-aware matching techniques applied sis, Composition, and Performance. Wadsworth, Belmont,
in our experiments were non-microtone-aware matching USA, second edition, 1997.
techniques extended for finer frequency spectrum of mu- [9] S. Doraisamy and S. M. Rüger. An approach towards a
sic. polyphonic music retrieval system. In Downie and Bain-
The results of our experiments show that: bridge [11].
[10] J. S. Downie. The Musifind music information retrieval
1. Exact microtonal interval standardisation in con- project, phase III: Evaluation of indexing options. In Cana-
junction with a microtone-aware scoring is effective dian Assoc. for Inf. Sci. Proc. the 23rd Annual Conf., Con-
for microtonal music information retrieval. nectedness: Information, Systems, People, Organizations,
pages 135–146. CAIS, 1995.
[11] J. S. Downie and D. Bainbridge, editors. International [29] H. Schaffrath. The Essen Associative Code: A code for
Symposium on Music Information Retrieval, volume 2, folksong analysis. In Selfridge-Field [30], pages 343–361.
Bloomington, Indiana, USA, October 2001. [30] E. Selfridge-Field, editor. Beyond MIDI: The Handbook of
[12] D. Gusfield. Algorithms on Strings, Trees, and Sequences: Musical Codes. MIT Press, Cambridge, USA, 1997.
Computer Science and Computational Biology. Cambridge [31] J. Shifrin and W. P. Birmingham. Effectiveness of HMM-
University Press, Cambridge, UK, 1997. based retrieval on large databases. In Hoos and Bainbridge
[13] D. Hawking and N. Craswell. Overview of the TREC-2001 [17], pages 33–39.
Web Track. In Voorhees and Harman [38], pages 61–68. [32] A. Supandi, U. Ngalagena, I. Djunaedi, D. Sain, and R. S.
[14] S. Henikoff and J. G. Henikoff. Amino acid substitution Riswara. Teori Dasar Karawitan. Pelita Masa, Bandung,
matrices from protein blocks. In Proc. Natl. Acad. Sci. Indonesia, third edition, 1976.
USA, volume 89, pages 10915–10919, November 1992. [33] I. S. H. Suyoto, S. Uitdenbogerd, and J. Zobel. Microtonal
[15] T. C. Hoad and J. Zobel. Methods for identifying versioned music information retrieval. Research paper submission for
and plagiarized documents. J. Am. Soc. Inf. Sci. Technol., RMIT School of Computer Science and Information Tech-
54(3):203–215, February 2003. nology “Research Methods” course (unpublished), 2003.
[16] T. C. Hoad and J. Zobel. Video similarity detection for [34] A. L. Uitdenbogerd. Music Information Retrieval Technol-
digital rights management. In M. Oudshoorn, editor, Proc. ogy. PhD thesis, School of Computer Science and Infor-
Australasian Computer Sci. Conf., pages 237–245, Ade- mation Technology, RMIT, Melbourne, Australia, 2002.
laide, Australia, February 2003. [35] A. L. Uitdenbogerd and J. Zobel. Matching techniques for
[17] H. H. Hoos and D. Bainbridge, editors. International Sym- large music databases. In D. Bulterman, K. Jeffay, and
posium on Music Information Retrieval, volume 4, Balti- H. J. Zhang, editors, Proc. ACM Multimedia Conf., pages
more, Maryland, USA, October 2003. 57–66, Orlando, USA, November 1999.
[18] B. Hugh. Claude Debussy and the Javanese Game- [36] A. L. Uitdenbogerd and J. Zobel. Music ranking tech-
lan. http://cctr.umkc.edu/userx/bhugh/ niques evaluated. In M. Oudshoorn, editor, Proc. Aus-
recit/debnotes/gamelan.html. Accessed 16 De- tralasian Computer Sci. Conf., pages 275–283, Melbourne,
cember 2003. Australia, January 2002.
[19] D. Huron. Humdrum and kern: Selective feature encoding. [37] E. M. Voorhees. Overview of the TREC-9 question an-
In Selfridge-Field [30], pages 375–401. swering track. In E. M. Voorhees and D. K. Harman,
[20] T. Kageyama, K. Mochizuki, and Y. Takashima. Melody editors, Proc. Ninth Text REtrieval Conf., pages 71–79,
retrieval with humming. In Proc. Int. Computer Music Gaithersburg, USA, November 2000. National Institute of
Conf., pages 349–351, 1993. Standards and Technology.
[21] C. Meek and W. P. Birmingham. The dangers of parsi- [38] E. M. Voorhees and D. K. Harman, editors. Proc. Tenth
mony in query-by-humming applications. In Hoos and Text REtrieval Conf., Gaithersburg, USA, November 2001.
Bainbridge [17], pages 51–56. National Institute of Standards and Technology.
[22] MIDI Manufacturer Association. MIDI Tuning Bank [39] E. M. Voorhees and D. M. Tice. The TREC-8 question
and Dump Extensions. http://www.midi.org/ answering track evaluation. In E. M. Voorhees and D. K.
about-midi/tuning_extens.shtml. Accessed Harman, editors, Proc. Eighth Text REtrieval Conf., pages
16 December 2003. 83–105, Gaithersburg, USA, November 1999. National In-
stitute of Standards and Technology.
[23] M. Mongeau and D. Sankoff. Comparison of musical se-
quences. In Computers and the Humanities, volume 24, [40] B. Wang, H. Xu, Z. Yang, Y. Liu, X. Cheng, D. Bu, and
pages 161–175. Kluwer, 1990. S. Bai. TREC-10 experiments at CAS-ICT: Filtering, web
and QA. In Voorhees and Harman [38], pages 109–121.
[24] G. Navarro and M. Raffinot. Flexible Pattern Matching
in Strings: Practical On-line Search Algorithms for Texts [41] I. H. Witten, A. Moffat, and T. C. Bell. Managing Giga-
and Biological Sequences. Cambridge University Press, bytes: Compressing and Indexing Documents and Images.
Cambridge, UK, 2002. Morgan Kaufmann Publishing, San Fransisco, USA, sec-
ond edition, 1999.
[25] J. Pickens. A comparison of language modeling and prob-
abilistic text information retrieval approaches to mono- [42] E. Zwicker and H. Fastl. Psychoacoustics: Facts and Mod-
phonic music retrieval. In Byrd et al. [2]. els. Springer-Verlag, Berlin, Germany, second edition,
1999.
[26] J. G. Roederer. Introduction to the Physics and Psy-
chophysics of Music. Springer-Verlag, New York, USA,
second edition, 1975.
[27] P. Roland. XML4MIR: Extensible markup language for
music information retrieval. In Byrd et al. [2].
[28] H. Schaffrath. The retrieval of monophonic melodies and
their variants: Concepts and strategies for computer-aided
analysis. In A. Marsden and A. Pople, editors, Computer
Representations and Models in Music, pages 95–109. Aca-
demic Press, London, UK, 1992.