Use of Spectral Autocorrelation in Spectral Envelope Linear Prediction For Speech Recognition

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 31

Use of spectral autocorrelation in

spectral envelope linear


prediction for speech recognition
BY 
B.LALITHA
(08691D3807) 
 

Under the  guidance of
Prof. M.  B. MANJUNATHA,  M.Tech (PhD) 
Introduction

• Out-Line  of the project  
1.Introduction to Speech
i) Speech Production
ii) Speech Recognition
2. Implementation
3.spectral enevelope LPC Analysis
4.Speech recognition using Dynamic  Time Warping (DTW)  
5.Result
1. INTRODUCTION TO SPEECH

i) Speech Production:

 Speech can be characterized as a signal carrying message information.

 The Purpose of speech is communication between humans.

 Speech is an acoustic waveform that conveys information from a speaker


to a listener
Human Speech Communication

Human Speech signal Human Ear


Speech Production Mechanism
Speech Production Mechanism

• Flow of air from lungs


• Vibrating vocal cords
• Speech production cavities
• Lips
• Sound wave
• Vowels (a, e, i), fricatives (f, s, z) and plosives (p, t, k)
Classification of Speech Signals

Voiced Sounds :
• Voiced sounds are produced when the vocal cords vibrate. These are
quasi-periodic pulses of air which excite the vocal tract.
• These are labeled as / u /, / d /, / w /, / i /, and / e /.

Unvoiced or Fricative Sounds :


• Unvoiced sounds are produced by forming constriction at some point in
the vocal tract and forcing air through the constriction at a high velocity to
produce turbulence.
• These are labeled as / ∫ / is a fricative “sh” / f /, and / s / are also fricatives.
ii) Speech Recognition

The three basic steps in Automatic Speech Recognition (ASR) are:

1. Parameter Estimation
2. Parameter Comparison
3. Decision Making
IMPLEMENTATION
Input Speech signal


Pre-emphasising

Hamming windowing

Linear prediction Predictive


Filter

Spectral autocorrelation

LPC Coefficients
Dynamic time wrapping Reference
Word

Extract total stored


vectors

Vector with minimum value

Recognized Word
Pre-Emphasis

• Boosting the energy in the high frequencies.

• The spectrum for voiced segments has more energy at lower frequencies
than higher frequencies.
– spectral tilt
– Spectral tilt is caused by the nature of the glottal pulse

• Boosting high-frequency energy gives more information to Acoustic


Model.
– Improves phone recognition performance
Example of pre-emphasis

• Before and after pre-emphasis


– Spectral slice from the vowel [aa]
• Pre-emphasing
• b = [ 1 -15/16];
• x= filter(b,1,x);
• Len = length(x);
• %Resample Decimation by 4
• % x = x(1:Fs/8000:Len);
• x = resample(x,8000,Fs);
• Fs=8000;
Pre-emphasized and Resampled
windowing

• Speech is Non-stationary signal.

• A window is non-zero inside some region and zero elsewhere.

• The speech extracted from each window is called as a frame.


Windowing
• The windowing process, showing
the frame shift and frame size

• Applying the window
• for i = 1:n
• for j = 1:nbFrame
• M(i, j) = speech(((j - 1) * m) + i);
• end
• end

• h = hamming(n);

• M2 = diag(h) * M;
Common window shapes
Rectangular window

Hamming window
Common window shapes
Linear prediction
• Linear Predictive Coding (LPC) provides
– low-dimension representation of speech signal at one
frame
– representation of spectral envelope, not harmonics
– “analytically tractable” method
– some ability to identify formants

• LPC models the speech signal at time point n as an


approximate linear combination of previous p samples :
s(n)  a1s (n  1)  a2 s(n  2)    a p s(n  p)

• where a1, a2, … ap are constant for each frame of speech.


If the error over a segment of speech is defined as
M2
En  e
mM1
2
n ( m)

2
M2
 p

   sn (m)   ak sn (m  k ) 
mM1  k 1 

where (sn = signal starting at time n)


then we can find ak by setting En/ak = 0 for k = 1,2,…p,
obtaining p equations and p unknowns:

p M2 M2

 aˆk
k 1
 sn ( m  i ) sn ( m  k ) 
m M1
 s ( m  i ) s ( m)
m M1
n n 1 i  p

Error is minimum (not maximum) when derivative is zero, because


as any ak changes away from optimum value, error will increase.
Features: LPC

2

M2 p

En    s(m)   ak s(m  k ) 
m  M1  k 1 

M2
 p
 p  p 
En   s (m)  2s ( m) ak s( m  k )    ak s (m  k )   ak s(m  r ) 
2

m M1  k 1  k 1  r 1 

 2 p

 s ( m )  2 s ( m ) a1 s ( m  1)  a1 s ( m  1)  ar s ( m  r )  
M2  
r 1
p
En     2s (m)a2 s (m  2)  a2 s (m  2) ar s (m  r )  
m M1  r 1 
 p 
  2 s ( m) a p s ( m  p )  a p s ( m  p )  a r s ( m  r ) 
 r 1 
M2
(m)   2 s (m) a1s ( m  1)  2a1s ( m  1) a1s ( m  1)  ...  a1s ( m  1) a p s ( m  p )  
En
a1
0 s
m M1
2

  2 s ( m)a2 s ( m  2)  a2 s ( m  2)a1s ( m  1)  ...  a2 s ( m  2) a p s (m  p ) 

M2  2 s (m) s (m  1)  2a1s ( m  1) s ( m  1)  s ( m  1) a2 s ( m  2)  ...  s ( m  1)a p s (m  p) 



m M1 a2 s ( m  2) s ( m  1)  a3 s (m  3) s ( m  1)  ...  a p s (m  p) s ( m  1)  0
M2

  2 s(m) s (m  1)  2a s(m  1) s(m  1)  2a


mM1
1 2 s ( m  1) s ( m  2)  ...  2a p s ( m  1) s ( m  p )  0

Repeat above equationns for a2, a3, … ap


M2
 p

 
mM1 
 2 s ( m ) s ( m  i )  2  ak s ( m  i ) s ( m  k )   0 1  i  p
k 1 
M2 M2
 p 
 2   s ( m) s ( m  i )   2    a k s ( m  i ) s ( m  k )   0 1  i  p
mM1 m  M 1  k 1 

p M2 M2

 a  s (m  i ) s (m  k )   s ( m  i ) s (m)
k 1
k
m M1 m  M1
1 i  p
LPC Autocorrelation Method
Autocorrelation: measure of periodicity in signal
M2
n (i, k )   s (m  i ) s (m  k )
mM1
n n

we can re-write equation as

 aˆ  (i, k )   (i,0)
k 1
k n n 1 i  p

We can solve for ak using several methods. The most common


method in speech processing is the “autocorrelation” method:

Force the signal to be zero outside of interval 0  m  N-1:

where w(m) is a finite-length window (e.g. Hamming) of length N


that is zero when less than 0 and greater than N-1. ŝ is the
windowed signal. As a result,
p

 aˆ R (| i  k |)  R (i)
k 1
k n n 1 i  p

because of setting the signal to zero outside the window, eqn


(6): N  p 1
1 i  p
z n (i, k )   sˆn (m  i ) sˆn (m  k )
m 0 0k  p

and this can be expressed as


N 1 ( i  k )
1 i  p
n (i, k )  
m 0
sˆn (m) sˆn (m  (i  k ))
0k  p
and this is identical to the autocorrelation function for |i-k|
because N  p 1
E   e ( m) 2

the autocorrelation function n


is symmetric, Rn(-k) = Rn(k) :
m0
n

n (i, k )  Rn (| i  k |)
N 1 k
Rn (k )   sˆ (m)sˆ (m  k )
m 0
n n

so the set of equations for ak (eqn (7)) can be combo of (7) and
(12):
In matrix form, equation (14) looks like this:

 Rn (0) Rn (1) Rn ( 2)  Rn ( p  1)   aˆ1   Rn (1) 


 Rn (1) Rn (0) Rn (1)  Rn ( p  2)   aˆ 2   Rn ( 2) 
 R ( 2) Rn (1) Rn (0)  Rn ( p  3)   aˆ3   Rn (3) 
 n
    
           
           

 n p  1)
R ( Rn ( p  2) Rn ( p  3)  Rn (0)   aˆ p 
  Rn ( p ) 

There is a recursive algorithm to solve this: Durbin’s solution

LPC Durbin’s Solution


Solve a Toeplitz (symmetric, diagonal elements equal) matrix
for values of :
p


k 1
k Rn (| i  k |)  Rn (i ) 1 i  p

E ( 0)  R(0)
 i 1 
ki   R(i )    (ji 1) R(i  j ) E ( i 1) 1 i  p
 j 1 
 i(i )  ki
 (ji )   (ji 1)  ki i(i j1) 1  j  i 1
E (i )  (1  ki2 ) E ( i 1)
aˆ j   (j p )
We can compute spectral envelope magnitude from LPC
parameters
by evaluating the transfer function S(z) for z=ej:
j G G
S (e )  
A(e j ) p
1 a
k 1
k e  jk
Finding frequency envelope using LPC method
• for col =1:nbFrame
• % compute Mth-order autocorrelation function:
• rx = zeros(1,Or+1)';
• speech1 = M2(:,col)'+0.000001;
• for i=1:Or+1,
• rx(i) = rx(i) + speech1(1:n-i+1) * speech1(1+i-1:n)';
• end
• % prepare the M by M Toeplitx covariance matrix:
• covmatrix = zeros(Or,Or);
• for i=1:Or,
• covmatrix(i,i:Or) = rx(1:Or-i+1)';
• covmatrix(i:Or,i) = rx(1:Or-i+1);
• end
• % solve "normal equations" for prediction coeffs
• Acoeffs = - covmatrix \ rx(2:Or+1);
• Alp = [1,Acoeffs']; % LP polynomial A(z)

dbenvlp(:,col) = 20*log(abs(freqz(1,Alp,n*2)'));
• end
Dynamic Time Warping (DTW
• The SELP analysis is evaluated using a Dynamic time wrapping
• T = { t1,t2,………,ti,….tn} ,               R= {r1,r2,………,ri,….,rm} 

• T= Test Signal or Unknown Signal 

• R= Reference Signal or known Signal 

• A matrix of m x n is created 

• C (x,  y) = MIN [C (x + 1, y) , C (x + 1, y + 1) , C (x,  y + 1) ] + D (x,  y) 


•A matrix of m x n is created 
Dynamic Time Warping (DTW)
Dynamic Time Warping (DTW

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy