0% found this document useful (0 votes)
4 views6 pages

Pitch

pitch detection
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views6 pages

Pitch

pitch detection
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Comparative Study of Various Pitch

Detection Algorithms
Prajwal S Rao*, Khoushikh S*, Sriram Ravishankar*, R Advaith Ananthkrishnan*, Balachandra K*

*Department of Telecommunication Engineering


B.M.S. College of Engineering
Bengaluru, India
Email: balachandrak.tce@bmsce.ac.in

Abstract — Digital Signal Processing deals with performing B. Pitch Detection in Time Domain
varied mathematical operations on different types of signals and in Time domain algorithms process the data in its raw form as it is
this paper, we study pitch estimation algorithms applied on music usually read from a sound card - a series of uniformly spaced
signals. We give a detailed study of time domain and frequency samples representing the movement of a waveform over time.
domain pitch detection algorithms namely Modified For example, 44100 samples per second is a common recording
Autocorrelation, Average Magnitude Difference function, Yin, speed. Each input sample x[n], is assumed to be a real number in
Cepstrum. The algorithms have been tested with real and
synthesized music signals and the performances were compared
the range -1 to 1 inclusive, with its value representing the height
based on time complexity and error obtained. of the waveform at discrete time instance `n'. This section
summarizes time domain pitch algorithms, like Autocorrelation
Keywords - Pitch, Auto, Amdf, Yin, Ceps, Ger (AUTO), Square Difference Function or the Yin method (YIN),
the Average Magnitude Difference Function (AMDF), Simple
I. INTRODUCTION Feature Based Method.
Speech is produced by vibration of vocal chords when air moves
from lungs (which being the power house of speech) to oral 1) Modified Autocorrelation Method
cavity through trachea or windpipe. The rate at which the vocal One the most popular methods for pitch estimation is
chords vibrate is known as Fundamental frequency or Pitch of an autocorrelation. It takes an input function, x[n], and cross-
individual. The rate per second of a vibration of instruments correlates it with itself; that is, each element is multiplied by a
constituting a wave, in a material such as in sound waves is called shifted version of x[n], and the results summed to get a single
as Frequency or Pitch. Males generally have low pitched voices autocorrelation value.
and frequency ranges from 55Hz to 131Hz (which roughly maps 
to A1 to C3 notes on a standard keyboard/piano) while females ”ȏȐαȭš ȏȐ‫šכ‬ȏېȐ(1)
have high pitched voices and frequency ranges from 170Hz to αͲ
262Hz (which roughly maps to F3 to C4 notes). Musical Where n = 1, 2 …N/2; N being the frame length.
instruments have a large range of frequency starting from as less
as 16Hz and can go up to as high as 4000Hz (C0 to B7). But Autocorrelation methods need at least two pitch periods to detect
frequency above 1000Hz becomes less significant so we limit pitch. This means that in order to detect a fundamental
our area of interest below 1000Hz. frequency of 40 Hz, at least 50 milliseconds (ms) of the speech
signal must be analyzed. Hence, we just move half the window
size. The above equation represents auto correlation of an input
II. PITCH DETECTION ALGORITHMS signal x[n]. If a signal is periodic with period p, then the
Pitch can be calculated in time domain or frequency domain. autocorrelation will have maxima at Multiples of p where the
Calculation in time domain is direct by using the actual data of function matches itself. The auto correlated function will have
the audio whereas pitch detection in frequency domain involves global maxima for n=0 (origin) which is the primary peak and
in moving from time space to frequency spaces using operations auto correlated function will have local maxima at integral
like Fourier transform. multiple of `p'. So, the distance of first local maxima from origin
represent the fundamental time period. Inverse of fundamental
A. Pitch Detection in Time Domain time period gives the pitch or the fundamental frequency. We
eliminate all the peaks below a threshold value so that peak
Time domain algorithms process the data in its raw form as it is picking becomes easier and amount of error is checking peaks are
usually read from a sound card - a series of uniformly spaced reduced. The thresholding makes it a difference between auto
samples representing the movement of a waveform over time. correlation and modified autocorrelation method. Figure1 is the
For example, 44100 samples per second is a common recording block diagram to calculate pitch using AUTO.
speed. Each input sample x[n], is assumed to be a real number in
the range -1 to 1 inclusive, with its value representing the height
of the waveform at discrete time instance `n'. This section Steps involved in finding the pitch of an audio sample by
summarizes time domain pitch algorithms, like Autocorrelation modified autocorrelation:
(AUTO), Square Difference Function or the Yin method (YIN), 1. The entire audio is broken in smaller segment which is
the Average Magnitude Difference Function (AMDF), Simple known as windowing. We chose window size as 2048
Feature Based Method. samples. Minimum frequency or pitch obtained using 2048
samples is approximately 50 Hz. We multiply the segmented
978-1-7281-9180-5/20/$31.00 ©2020 IEEE
audio by a suitable window function

Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 09:50:13 UTC from IEEE Xplore. Restrictions apply.

2. Autocorrelation is performed on windowed audio. •ȏȐαȭȁšȏȐΫš ȏ ΫȐȁ(2)
αͲ
3. We eliminate all the values below a threshold value and now
Where n = 1, 2 …N; N being the frame length.
we check for peaks or maxima. The index of maxima is
When the waveform is shifted by an amount,  that is not the
noted.
period the differences will become greater, and cause an increased
4. Pitch or fundamental frequency of the windowed audio is sum. Whereas, when  equals the period it will tend to a
obtained by dividing the sampling frequency (Fs) by the minimum. In this algorithm we find the location of local minima.
maxima index. Figure 3 is the block diagram to calculate pitch using AMDF.
5. We increment the window pointer by hop length. If we use
50 percentage overlap, we increment the pointer by 1024 Steps involved in finding the pitch of an audio sample by Average
samples. Magnitude Difference Function:
6. We check if the window pointer has reached the end of 1. The entire audio is broken in smaller segment which is
audio. If the window pointer has not reached the end of the known as windowing. We chose window size as 2048
audio, we go back to step 1 else computation of pitch for samples. Minimum frequency or pitch obtained using 2048
entire audio is complete and perform desired operation on samples is approximately 50 Hz. We multiply the segmented
pitch like plotting etc. audio by a suitable window function.
2. We take absolute difference of the windowed audio with
circular shifted windowed sample.
3. We sum the result obtained in step 2 and sum value is stored.
4. Circular shift windowed audio and go back to step 2 The
process of circular shift and taking absolute difference is
repeated till we get back the initial windowed signal.
5. We check for the minimum value obtained in step 3 i.e. the
minimum value if the sums. The index of minima is noted.
6. Pitch or fundamental frequency of the windowed audio is
obtained by dividing the sampling frequency (Fs) by the
Fig. 1. Block diagram representation of pitch estimation using modified minima index.
auto correlation
7. We increment the window pointer by hop length. If we use
50 percentage overlap, we increment the pointer by 1024
samples.
8. We check if the window pointer has reached the end of
audio. If the window pointer has not reached the end of the
audio, we go back to step 1 else computation of pitch for
entire audio is complete and perform desired operation on
pitch like plotting etc.

Fig. 2. Modified auto correlation applied on a windowed audio


Figure 2 shows autocorrelation applied on a windowed audio.
Since autocorrelation is an even function it is symmetric about
origin and we consider only one half of the correlated signal.
Figure shows a global maximum value of 6.99 at origin which is
the energy of the signal. First local maxima are present at 61st Fig. 3. Block diagram representation of pitch estimation using AMDF algorithm
sample. The sampling frequency (Fs) for the windowed sample
is 48000 samples/sec. So, the pitch or fundamental frequency is
given by Fs/maxima index which is 48000/61 = 787Hz.

2) Average magnitude Difference Function


The idea for using the Average Magnitude Difference Function
(AMDF) is that if a signal is pseudo-periodic then any two
adjacent periods of the waveform are similar in shape. So, if the
waveform is shifted by one period and compared to its original
self, then most of the peaks and troughs will line up well. If one
simply takes the differences from one waveform to the other and
then sums them up, the result is not useful as some values are
positive and some negative, tending to cancel each other out. This
could be dealt by using the absolute value of the difference and
Fig. 4. AMDF applied on a windowed audio
averaging them, as discussed in Equation stated below.

Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 09:50:13 UTC from IEEE Xplore. Restrictions apply.
Figure 4 shows average magnitude difference function applied 5. We check for the minimum value obtained in step 4 i.e. the
on a windowed audio. First local minima are present at 133rd minimum value if the sums. The index of minima is noted.
sample. The sampling frequency (Fs) for the windowed sample 6. Pitch or fundamental frequency of the windowed and audio
is 44100 samples/sec. So, the pitch or fundamental frequency is is obtained by dividing the sampling frequency (Fs) by the
given by Fs/minima index which is 44100/133 = 331.57Hz. minima index.
7. We increment the window pointer by hop length. If we use
3) Yin Method 50 percentage overlap, we increment the pointer by 1024
The idea for using the square difference function (SDF) or Yin samples.
method is that if a signal is pseudo- periodic then any two 8. We check if the window pointer has reached the end of
adjacent periods of the waveform are similar in shape. So, if the audio. If the window pointer has not reached the end of the
waveform is shifted by one period and compared to its original audio, we go back to step 1 else computation of pitch for
self, then most of the peaks and troughs will line up well. If one entire audio is complete and perform desired operation on
simply takes the differences from one waveform to the other and pitch like plotting etc.
then sums them up, the result is not useful, as some values are
positive and some negative, tending to cancel each other out.
This could be dealt with by using the absolute value of the Figure 6 shows Yin method applied on a windowed audio. First
difference, as discussed in Equation stated above; however, it is local minima are present at 343rd sample. The sampling
more common to sum the square of the differences, where each frequency (Fs) for the windowed sample is 44100 samples/sec.
term contributes a non- negative amount to the total. So, the pitch or fundamental frequency is given by Fs/minima
 index which is 44100/343 = 128.57Hz.
•ȏȐα ȭ ȋšȏȐΫš ȏΫȐȌʹ(3)
αͲ
Where n = 1, 2 …N; N being the frame length.
When the waveform is shifted by an amount,  that is not the
period the differences will become greater, and cause an increased
sum. Whereas, when  equals the period it will tend to a
minimum. In this algorithm we find the location of local minima.
Figure 5 is the block diagram to calculate pitch using YIN.

Fig. 6. Yin applied on a windowed audio

C. Pitch Detection in Frequency Domain


Frequency domain algorithms do not investigate properties of the
raw signal directly, but instead first we pre-process the raw, time
domain data, transforming it into the frequency space. This is
done using the Fourier transform. Frequency domain pitch
algorithms, includes Harmonic Product Spectrum, Sub
harmonic-to-Harmonic Ratio, Complex Cepstrum and Spectrum
Fig. 5. Block diagram representation of pitch estimation using Yin Peak Method.
algorithm
1) Cepstrum Method
Steps involved in finding the pitch of an audio sample by Yin Cepstrum method is used for pitch detection when we have
method: signals which are convolutedly related to each other. First, we
make these convolutedly related signals to linearly related. This
1. The entire audio is broken in smaller segment which is can be done by the following method. First, we move to
known as windowing. We chose window size as 2048
frequency domain where convolution is converted into
samples. Minimum frequency or pitch obtained using 2048
multiplication. Later we take logarithm so that they become
samples is approximately 50 Hz. We multiply the segmented
linearly dependent and finally move back to time domain
audio by a suitable window function.
2. We take absolute difference square of the windowed audio ܺሺ߱ሻ  ൌ  σஶ ି௝ఠ௡
௡ୀିஶ ‫ݔ‬ሾ݊ሿ݁ (4)
and the circular shifted windowed sample.
3. We sum the result obtained in step 2 and sum value is stored. Ž‘‰൫ܺሺ߱ሻ൯ ൌ Ž‘‰ȁܺሺ߱ሻȁ ൅ ݆ƒ”‰ሺܺሺ߱ሻሻ (5)
4. Circular shift windowed audio and go back to step 2. The
ଵ ଶగ
process of circular shift and taking absolute difference is ‫ ݔ‬ᇱ ሾ݊ሿ ൌ ‫׬‬଴ Ž‘‰൫ܺሺ߱ሻ൯ ݁ ௝ఠ௡ ݀߱ (6)
ଶగ
repeated till we get back the initial windowed signal.

Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 09:50:13 UTC from IEEE Xplore. Restrictions apply.
2) Average Magnitude Difference Function
The time complexity of AMDF method is O(n2) since it involves
2 nested traversals. Outermost traversal is required for running
through the audio signal and inner traversal is for circular
shifting windowed sample. Not much significant time is used for
subtraction of 2 sequences.

3) Yin Method
The time complexity of YIN method is O(n2) since it involves 2
nested traversals. Outermost traversal is required for running
through the audio signal and inner traversal is for circular
shifting windowed sample. Significant time is used for
Fig. 7. Block diagram representation of pitch estimation using CEPS multiplication of 2 sequences. Because of the above two reasons,
algorithm
Yin takes more time for computation.
Steps involved in finding the pitch of an audio sample by Yin
method:
4) Cepstrum Method
1. The entire audio is broken in smaller segment which is
known as windowing. We chose window size as 2048 The time complexity of CEPS method is O(n) since it just
samples. Minimum frequency or pitch obtained using 2048 involves traversing through the windowed segment once i.e. just
samples is approximately 50 Hz. We multiply the segmented running through the audio length. Major amount of time is used
audio by a suitable window function. in determining the peaks of the processed audio.

2. We then move to frequency domain by applying FFT TABLE I. Computation time (in seconds) for various pitch detection algorithms
(Fourier Transform) on the windowed audio.
Instrument AUTO AMDF YIN CEPS
3. We apply logarithm for the transformed audio. samples
4. We move back to time domain by taking IFFT (Inverse Violin-e6 0.267 0.769 1.373 0.125
Fourier Transform). Violin-e4 0.215 0.323 0.507 0.115
5. Now we check for peaks or maxima. The index of maxima is Trumpet-e3 0.307 1.396 2.432 0.134
noted.
Oboe-g5 0.250 0.643 0.986 0.120
6. Pitch or fundamental frequency of the windowed audio is
obtained by dividing the sampling frequency (Fs) by the Gitar-b2 0.279 0.991 1.748 0.137
minima index.
Flute-a4 0.366 2.261 3.672 0.142
7. We increment the window pointer by hop length. If we use
50 percentage overlap, we increment the pointer by 1024 Bass-c1 0.327 1.057 1.855 0.130
samples.
8. We check if the window pointer has reached the end of
audio. If the window pointer has not reached the end of the
audio, we go back to step 1, else computation of pitch for
entire audio is complete and perform desired operation on
pitch like plotting etc.

III. COMPARISION OF MULTIPLE PITCH DETECTION


ALGORITHMS
A. Time Complexity / Computation Time
These pitch detection algorithms were implemented on
MATLAB software. Table1 has consolidated result of
computation time for multiple pitch detection algorithms. The
time required for calculation of pitch was also calculated using
MATLAB software.
Data set for checking pitch included standard notes used in music
played by various instruments. Musical notes likeE6, E4, E3, G5, Fig. 8. Computation time (in s) for various pitch detection algorithms
B2, A4, and C1 corresponds to 1295Hz, 324Hz, 162Hz, 770Hz,
121Hz, 440Hz and 32Hz respectively. Instruments like Violin,
Trumpet, Bass, Flute, Guitar, and Oboe were used. A. Error In Multiple Pitch Detection Algorithm
Error while calculating can arise due to any reasons. We
calculate the Gross Error (GER) using the equation stated below.
1) Modified Autocorrelation Method
The time complexity of AUTO method is O(n) since it just ȁ஺௖௧௨௔௟ி௥௘௤௨௘௡௖௬Ȃ஽௘௧௘௖௧௘ௗி௥௘௤௨௘௡௖௬ȁ
involves traversing through the windowed segment once i.e. just ‫ ܴܧܩ‬ൌ  σே
௜ୀଵ ‫ͲͲͳݔ‬Ψ (7)
஺௖௧௨௔௟ி௥௘௤௨௘௡௖௬
running through the audio length. Major amount of time is used
in determining the peaks of the autocorrelated audio.

Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 09:50:13 UTC from IEEE Xplore. Restrictions apply.
Where the summation runs from 1 to total number of frames. We Error in computation follows the order
take absolute sum of the error to avoid cancelling. In auto CEPS > AUTO > AMDF > YIN
correlation proper peak may not be detected due to improper
thresholding as a result which wrong pitch will be detected. The order of time computation is similar to previous live
Similar error can be expected in case of cepstrum method. calculation which is
If fundamental frequency is not dominant when compared to its CEPS < AUTO < AMDF < YIN
harmonics cepstrum method fails. This can be seen when B2 note Figure 10 shows the detection of pitch using various pitch
is played using guitar. Amplitude value at 121Hz is much lesser detection algorithms for a piano audio.
when compared to amplitude at 242Hz (first harmonic of B1 We discussed various parameters like time complexity and GER
note) in spectral domain. As a result of which we got a GER of between multiple pitch detection algorithms. Here we discuss
187%. All the pitch detection algorithms used performs good the pros and cons of multiple pitch detection algorithms.
when the pitch is less than 1000Hz or low frequency
components. So, when E6 note is played chances of getting error
is more and is evident when compared to others. Figure6 and
Table2 show comparison of error percentage for multiple pitch
detection algorithms.

IV. RESULT AND DISCUSION


We implemented a live pitch detector using MATLAB software
uses each and every pitch detection algorithm, from this we have
the following results. When AMDF and YIN method was used
for pitch detection the latency was around 1 to 2 seconds whereas
AUTO and CEPS method had latency less than 1 second. The
order of latency is given below with CEPS taking the least time Fig. 10. Comparison of various pitch plots
and YIN methods having highest latency.
CEPS < AUTO < AMDF < YIN
A. Modified Autocorrelation Method

1. Advantages
i. This method of pitch detection is easy and has a faster
computation.
ii. Easy to understand the concept since its mathematical
modeling is easy.
2. Disadvantages
i. The level of peak to be chosen is challenging. Peak picking
may give all the peaks before the first local maxima, but that is
not the actual peak. Hence, we make use of Adaptive
Autocorrelation function.
ii. Error in calculating the pitch is moderate

Fig. 9. Comparison of error (in percentage) for multiple pitch detection


algorithms B. Average Magnitude Difference Function

TABLE II. Comparison of error in various pitch detection algorithms (in %) 1. Advantages

Instrume AUTO AMDF YIN CEPS


i. Mathematics behind the method is too simple
nt ii. Error in calculation of pitch is minimum
samples
Violin-e6 19.64 16.65 15.8 16.64 2. Disadvantages
Violin-e4 2.78 1.36 1.56 4.55 i. Time for pitch computation is moderate.
Trumpet 2.82 1.45 1.32 4.75
-e3 C. Yin Method
Oboe-g5 4.61 1.25 1.25 9.42
Flute-a4 1.79 1.02 1.21 4.62
9.74 5.42 5.25 6.92
1. Advantages
Bass-c1
i. The method is relatively simple and may be implemented
efficiently and with low latency, and may be extended in several
We used a recorded music bit or audio signal of about 10 seconds ways to handle several forms of aperiodicity that occur in
(roughly 500,000 samples) having various frequencies we found particular applications.
that the time required for computation is more for YIN and least ii. Error in calculation of pitch using this algorithm is minimum.
for CEPS method. The closeness to the actual frequency is 2. Disadvantages
obtained in many frames by using YIN method whereas CEPS
method showed more error. i. Time required for computation of pitch is more.

Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 09:50:13 UTC from IEEE Xplore. Restrictions apply.
D. Cepstrum Method
REFERENCES

1. Advantages [1] Denis Jouvet, Yves Laprie. Performance Analysis of Several Pitch
Detection Algorithms on Simulated and Real Noisy Speech Data.
i. Cepstrum analysis, once understood is a very simple way to EUSIPCO’2017, 25th European Signal Processing Conference, Aug 2017, Kos,
estimate spectral component since it does not deal with phase. Greece.
ii. Time computation is moderate since it involves FFT and IFFT [2] Lyudmila Sukhostat and Yadigar Imamverdiyev (2014). “A Comparative
repeatedly. Analysis of Pitch Detection Methods Under the Influence of Different Noise
Conditions”.
2. Disadvantages
[3] Lyudmila Sukhostat, Yadigar Imamverdiyev “A Comparative Analysis of
i. Performing FFT and IFFT are computationally expensive Pitch Detection Methods under the Influence of Different Noise Conditions”,
and there is a chance of losing some data. Journal of Voice September 2014.
ii. Cepstrum analysis is essentially a low-pass filter filters the [4] Rabiner, L.R. (1977), “On the Use of Autocorrelation Analysis for Pitch
spectral components which is averaging the spectral Detection," IEEE Trans. Acoustic, Speech, Signal Process. 25, 24-33.
components.
[5]De Cheveigne, A., Kawahara, H. (2002). “YIN, a fundamental frequency
iii. For cepstrum method the fundamental must be dominant estimator for speech and music," J. Acoustic Society Am. 111, 1917-1930.
compared to its harmonics else chances of getting error are
high. This is explained in the GER part. [6] Alain de Cheveigne and Hideki Kawahara (2001). “Comparative
evaluation of F0 estimation algorithms”.

[7] Gerhard, David. (2003). Pitch Extraction and Fundamental Frequency:


History and Current Techniques.
V. CONCLUSION
[8] Camacho A. SWIPE: A saw tooth waveform inspired pitch estimator for
This paper discusses multiple pitch detection algorithms. The speech and music. Gainesville, Florida: University of Florida; 2007.
above discussed algorithms are implemented in MATLAB and a
comparative analysis was done between computation time and [9]T. T. Swee, S. H. S. Salleh and M. R. Jamaludin, "Speech pitch detection
using short-time energy," International Conference on Computer and
GER by the pitch detection algorithms – AUTO, AMDF, YIN, Communication Engineering (ICCCE'10), Kuala Lumpur, 2010, pp. 1-6, doi:
and CEPS. This can be implemented in various musical 10.1109/ICCCE.2010.5556836.
applications. For live pitch detection applications where we need
less computational time, we can make use of pitch detection [10] T. Drugman, G. Huybrechts, V. Klimkov and A. Moinet, "Traditional
Machine Learning for Pitch Detection," in IEEE Signal Processing Letters, vol.
algorithms like AUTO or CEPS, whereas for other application 25, no. 11, pp. 1745-1749, Nov. 2018, doi: 10.1109/LSP.2018.2874155.
like pitch shifting we need less error hence we can make use of
algorithms like AMDF and YIN where computational time [11] B. Faghih and J. Timoney, "An investigation into several pitch detection
algorithms for singing phrases analysis," 2019 30th Irish Signals and Systems
period is not the priority. Conference (ISSC), Maynooth, Ireland, 2019, pp. 1-5, doi:
10.1109/ISSC.2019.8904943.

Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 09:50:13 UTC from IEEE Xplore. Restrictions apply.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy