Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition

Use of spectral autocorrelation in
spectral envelope linear

prediction for speech recognition
BY
B.LALITHA
(08691D3807)

Under the guidance of
Prof. M. B. MANJUNATHA, M.Tech (PhD)
Introduction
• Out-Line of the project
1.Introduction to Speech
i) Speech Production
ii) Speech Recognition
2. Implementation
3.spectral enevelope LPC Analysis
4.Speech recognition using Dynamic Time Warping (DTW)
5.Result
1. INTRODUCTION TO SPEECH
i) Speech Production:
 Speech can be characterized as a signal carrying message information.
 The Purpose of speech is communication between humans.
 Speech is an acoustic waveform that conveys information from a speaker

to a listener
Human Speech Communication
Human Speech signal Human Ear

Speech Production Mechanism
Speech Production Mechanism
• Flow of air from lungs

• Vibrating vocal cords
• Speech production cavities
• Lips
• Sound wave
• Vowels (a, e, i), fricatives (f, s, z) and plosives (p, t, k)
Classification of Speech Signals
Voiced Sounds :
• Voiced sounds are produced when the vocal cords vibrate. These are
quasi-periodic pulses of air which excite the vocal tract.
• These are labeled as / u /, / d /, / w /, / i /, and / e /.
Unvoiced or Fricative Sounds :

• Unvoiced sounds are produced by forming constriction at some point in
the vocal tract and forcing air through the constriction at a high velocity to
produce turbulence.
• These are labeled as / ∫ / is a fricative “sh” / f /, and / s / are also fricatives.
ii) Speech Recognition
The three basic steps in Automatic Speech Recognition (ASR) are:
1. Parameter Estimation
2. Parameter Comparison
3. Decision Making
IMPLEMENTATION
Input Speech signal
•
Pre-emphasising
Hamming windowing
Linear prediction Predictive

Filter
Spectral autocorrelation
LPC Coefficients
Dynamic time wrapping Reference
Word
Extract total stored

vectors
Vector with minimum value
Recognized Word
Pre-Emphasis
• Boosting the energy in the high frequencies.
• The spectrum for voiced segments has more energy at lower frequencies
than higher frequencies.
– spectral tilt
– Spectral tilt is caused by the nature of the glottal pulse
• Boosting high-frequency energy gives more information to Acoustic

Model.
– Improves phone recognition performance
Example of pre-emphasis
• Before and after pre-emphasis

– Spectral slice from the vowel [aa]
• Pre-emphasing
• b = [ 1 -15/16];
• x= filter(b,1,x);
• Len = length(x);
• %Resample Decimation by 4
• % x = x(1:Fs/8000:Len);
• x = resample(x,8000,Fs);
• Fs=8000;
Pre-emphasized and Resampled
windowing
• Speech is Non-stationary signal.
• A window is non-zero inside some region and zero elsewhere.
• The speech extracted from each window is called as a frame.

Windowing
• The windowing process, showing
the frame shift and frame size
•
• Applying the window
• for i = 1:n
• for j = 1:nbFrame
• M(i, j) = speech(((j - 1) * m) + i);
• end
• end
• h = hamming(n);
• M2 = diag(h) * M;
Common window shapes
Rectangular window
Hamming window
Common window shapes
Linear prediction
• Linear Predictive Coding (LPC) provides
– low-dimension representation of speech signal at one
frame
– representation of spectral envelope, not harmonics
– “analytically tractable” method
– some ability to identify formants
• LPC models the speech signal at time point n as an

approximate linear combination of previous p samples :
s(n)  a1s (n  1)  a2 s(n  2)    a p s(n  p)
• where a1, a2, … ap are constant for each frame of speech.

If the error over a segment of speech is defined as
M2
En  e
mM1
2
n ( m)
2
M2
 p

   sn (m)   ak sn (m  k ) 
mM1  k 1 
where (sn = signal starting at time n)

then we can find ak by setting En/ak = 0 for k = 1,2,…p,
obtaining p equations and p unknowns:
p M2 M2
 aˆk
k 1
 sn ( m  i ) sn ( m  k ) 
m M1
 s ( m  i ) s ( m)
m M1
n n 1 i  p
Error is minimum (not maximum) when derivative is zero, because

as any ak changes away from optimum value, error will increase.
Features: LPC
2

M2 p

En    s(m)   ak s(m  k ) 
m  M1  k 1 
M2
 p
 p  p 
En   s (m)  2s ( m) ak s( m  k )    ak s (m  k )   ak s(m  r ) 
2
m M1  k 1  k 1  r 1 
 2 p

 s ( m )  2 s ( m ) a1 s ( m  1)  a1 s ( m  1)  ar s ( m  r )  
M2  
r 1
p
En     2s (m)a2 s (m  2)  a2 s (m  2) ar s (m  r )  
m M1  r 1 
 p 
  2 s ( m) a p s ( m  p )  a p s ( m  p )  a r s ( m  r ) 
 r 1 
M2
(m)   2 s (m) a1s ( m  1)  2a1s ( m  1) a1s ( m  1)  ...  a1s ( m  1) a p s ( m  p )  
En
a1
0 s
m M1
2
  2 s ( m)a2 s ( m  2)  a2 s ( m  2)a1s ( m  1)  ...  a2 s ( m  2) a p s (m  p ) 
M2  2 s (m) s (m  1)  2a1s ( m  1) s ( m  1)  s ( m  1) a2 s ( m  2)  ...  s ( m  1)a p s (m  p) 


m M1 a2 s ( m  2) s ( m  1)  a3 s (m  3) s ( m  1)  ...  a p s (m  p) s ( m  1)  0
M2
  2 s(m) s (m  1)  2a s(m  1) s(m  1)  2a

mM1
1 2 s ( m  1) s ( m  2)  ...  2a p s ( m  1) s ( m  p )  0
Repeat above equationns for a2, a3, … ap

M2
 p

 
mM1 
 2 s ( m ) s ( m  i )  2  ak s ( m  i ) s ( m  k )   0 1  i  p
k 1 
M2 M2
 p 
 2   s ( m) s ( m  i )   2    a k s ( m  i ) s ( m  k )   0 1  i  p
mM1 m  M 1  k 1 
p M2 M2
 a  s (m  i ) s (m  k )   s ( m  i ) s (m)
k 1
k
m M1 m  M1
1 i  p
LPC Autocorrelation Method
Autocorrelation: measure of periodicity in signal
M2
n (i, k )   s (m  i ) s (m  k )
mM1
n n
we can re-write equation as
 aˆ  (i, k )   (i,0)
k 1
k n n 1 i  p
We can solve for ak using several methods. The most common

method in speech processing is the “autocorrelation” method:
Force the signal to be zero outside of interval 0  m  N-1:
where w(m) is a finite-length window (e.g. Hamming) of length N

that is zero when less than 0 and greater than N-1. ŝ is the
windowed signal. As a result,
p
 aˆ R (| i  k |)  R (i)
k 1
k n n 1 i  p
because of setting the signal to zero outside the window, eqn

(6): N  p 1
1 i  p
z n (i, k )   sˆn (m  i ) sˆn (m  k )
m 0 0k  p
and this can be expressed as

N 1 ( i  k )
1 i  p
n (i, k )  
m 0
sˆn (m) sˆn (m  (i  k ))
0k  p
and this is identical to the autocorrelation function for |i-k|
because N  p 1
E   e ( m) 2
the autocorrelation function n

is symmetric, Rn(-k) = Rn(k) :
m0
n
n (i, k )  Rn (| i  k |)
N 1 k
Rn (k )   sˆ (m)sˆ (m  k )
m 0
n n
so the set of equations for ak (eqn (7)) can be combo of (7) and
(12):
In matrix form, equation (14) looks like this:
 Rn (0) Rn (1) Rn ( 2)  Rn ( p  1)   aˆ1   Rn (1) 

 Rn (1) Rn (0) Rn (1)  Rn ( p  2)   aˆ 2   Rn ( 2) 
 R ( 2) Rn (1) Rn (0)  Rn ( p  3)   aˆ3   Rn (3) 
 n
    
           
           

 n p  1)
R ( Rn ( p  2) Rn ( p  3)  Rn (0)   aˆ p 
  Rn ( p ) 
There is a recursive algorithm to solve this: Durbin’s solution
LPC Durbin’s Solution

Solve a Toeplitz (symmetric, diagonal elements equal) matrix
for values of :
p

k 1
k Rn (| i  k |)  Rn (i ) 1 i  p
E ( 0)  R(0)
 i 1 
ki   R(i )    (ji 1) R(i  j ) E ( i 1) 1 i  p
 j 1 
 i(i )  ki
 (ji )   (ji 1)  ki i(i j1) 1  j  i 1
E (i )  (1  ki2 ) E ( i 1)
aˆ j   (j p )
We can compute spectral envelope magnitude from LPC
parameters
by evaluating the transfer function S(z) for z=ej:
j G G
S (e )  
A(e j ) p
1 a
k 1
k e  jk
Finding frequency envelope using LPC method
• for col =1:nbFrame
• % compute Mth-order autocorrelation function:
• rx = zeros(1,Or+1)';
• speech1 = M2(:,col)'+0.000001;
• for i=1:Or+1,
• rx(i) = rx(i) + speech1(1:n-i+1) * speech1(1+i-1:n)';
• end
• % prepare the M by M Toeplitx covariance matrix:
• covmatrix = zeros(Or,Or);
• for i=1:Or,
• covmatrix(i,i:Or) = rx(1:Or-i+1)';
• covmatrix(i:Or,i) = rx(1:Or-i+1);
• end
• % solve "normal equations" for prediction coeffs
• Acoeffs = - covmatrix \ rx(2:Or+1);
• Alp = [1,Acoeffs']; % LP polynomial A(z)
•
dbenvlp(:,col) = 20*log(abs(freqz(1,Alp,n*2)'));
• end
Dynamic Time Warping (DTW
• The SELP analysis is evaluated using a Dynamic time wrapping
• T = { t1,t2,………,ti,….tn} , R= {r1,r2,………,ri,….,rm}
• T= Test Signal or Unknown Signal
• R= Reference Signal or known Signal
• A matrix of m x n is created
• C (x, y) = MIN [C (x + 1, y) , C (x + 1, y + 1) , C (x, y + 1) ] + D (x, y)

•A matrix of m x n is created
Dynamic Time Warping (DTW)
Dynamic Time Warping (DTW

Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition

Uploaded by

Copyright:

Available Formats

Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition

Uploaded by

Copyright:

Available Formats

Use of spectral autocorrelation in

spectral envelope linear

 Speech can be characterized as a signal carrying message information.

 The Purpose of speech is communication between humans.

 Speech is an acoustic waveform that conveys information from a speaker

Human Speech signal Human Ear

• Flow of air from lungs

Unvoiced or Fricative Sounds :

The three basic steps in Automatic Speech Recognition (ASR) are:

Linear prediction Predictive

Extract total stored

Vector with minimum value

• Boosting the energy in the high frequencies.

• Boosting high-frequency energy gives more information to Acoustic

• Before and after pre-emphasis

• Speech is Non-stationary signal.

• A window is non-zero inside some region and zero elsewhere.

• The speech extracted from each window is called as a frame.

• LPC models the speech signal at time point n as an

• where a1, a2, … ap are constant for each frame of speech.

where (sn = signal starting at time n)

Error is minimum (not maximum) when derivative is zero, because

  2 s ( m)a2 s ( m  2)  a2 s ( m  2)a1s ( m  1)  ...  a2 s ( m  2) a p s (m  p ) 

M2  2 s (m) s (m  1)  2a1s ( m  1) s ( m  1)  s ( m  1) a2 s ( m  2)  ...  s ( m  1)a p s (m  p) 

  2 s(m) s (m  1)  2a s(m  1) s(m  1)  2a

Repeat above equationns for a2, a3, … ap

we can re-write equation as

We can solve for ak using several methods. The most common

Force the signal to be zero outside of interval 0  m  N-1:

where w(m) is a finite-length window (e.g. Hamming) of length N

because of setting the signal to zero outside the window, eqn

and this can be expressed as

the autocorrelation function n

 Rn (0) Rn (1) Rn ( 2)  Rn ( p  1)   aˆ1   Rn (1) 

There is a recursive algorithm to solve this: Durbin’s solution

LPC Durbin’s Solution

• T= Test Signal or Unknown Signal

• R= Reference Signal or known Signal

• C (x, y) = MIN [C (x + 1, y) , C (x + 1, y + 1) , C (x, y + 1) ] + D (x, y)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.