Speaker Recognition System Based On VQ in MATLAB Environment
Speaker Recognition System Based On VQ in MATLAB Environment
Speaker Recognition System Based On VQ in MATLAB Environment
Yanxiang Geng, Guangyan Wang, Cheng Zhu, Teng Fei, and Xiaopei Liu
1 Introduction
J. Lei et al. (Eds.): AICI 2012, CCIS 315, pp. 494501, 2012.
Springer-Verlag Berlin Heidelberg 2012
Speaker Recognition System Based on VQ in MATLAB Environment 495
Speaker 1
Train ThresholdT
Feature extraction
Speaker N
Speaker 1
Feature extraction
H (z ) =
1 1
A(z ) p
1 + a j z j
P is the LPC order of linear predictor. Let the impulse response is h ' ( n ) , cepstrum
is needed to calculate. According to the definition of cepstrum:
ln H (z ) = H ' (z ) = h' (n ) z n (2)
n =1 n
P as order of generally takes 14 which is less than the number of points for one frame
of voice sample. This number is symbol, so LPCC only represents former P values in
h ' ( n ) (n = 1,2, ...,).
The result of testing LPCC voice parameters extracted from Li Ting is shown
in Fig.2.
In practice, in order to obtain better recognition results, LPCC often need post-
processes. For example, each component of cepstral coefficients multiply a
appropriate weighting factor, or seek a first-order and second-order differential on the
basis of the current LPCC.
Mel ( f ) = 2595lg 1 + (4)
X (n ) is the input speech signal, N represents the points of the Fourier transform.
2) The discrete spectrum X (k ) in the equation (5) is produced by Mel frequency
filter to getting logarithm spectrum S (m).
3) The logarithm of each filter output energy:
N 1
S (m ) = ln X (k ) H m (k ), (0 m M )
k =0
M 1 n m +
C (n ) = S (m )cos 2
, (0 m M )
(7 )
5) MFCC feature directly get from equation 7 is static. Since MFCC mainly
reflects static characteristics of speech signal, dynamic characteristics can be
converted into static ones by first and second order differences.
The result of testing MFCC voice parameters extracted from Li Ting is shown
in Fig.3.
498 Y. Geng et al.
C( i ) =
(i )
(i )
, , C (i )
If there are N individuals
( ) (i = 1, 2, ..., N )are needed in the training stage.
2.2.2 Identification
N 1
process S (m ) = ln X (k ) H m (k ), (0 m M )
k =0
1) Sequence of feature vector is extracted from the test speech X 1 , X 2 ,..., X M
2) Each template victories sequence of feature vectors, calculating average
quantitative error:
Di =
M n= 1 min
1 M
1l L [D (X n , Y il )] (8)
where, Yl
l=1,2,L; i=1,2,
N: the NO.L code vector in NO.i codebook,
( )
d X n , Yl i :the distance between test vector X n and code vector.
3) Finally, the person whose codebook with the smallest average quantization error
is chosen as recognition results.
The recognition system can realize both correlated and uncorrelated identification
for speaker. Characteristic parameters through the selection and training modules not
only achieve single speech recognition, but also realize multi-person speech
This module contains two sub-modules: recognition module for single-person and
4 Conclusions
This paper makes of vector quantization identification algorithm to extract and match
speakers features. Thus, speakers recognition is achieved. By using LPCC and
MFCC as the improved extraction-parameters of features, speaker recognition system
based on VQ can identify the speaker fast, effectively and accurately, also proved the
efficiency by MATLAB simulation.
Acknowledgments. This work was supported by a grant from the University Science
and Technology Development Fund of Tianjin (No. 20080710).
[1] Ramachandran Ravi, P., Farrell, K.R., Ramachandran, R., Mammone, R.J.: Speaker
recognitiongeneral classifier approaches and data fusion methods. Pattern
Recognition 35(12), 28012821 (2002)
[2] Li, Z., Yang, Q.: Speaker Recognition System Based on weighted feature parameter.
Physics Procedia 25(0), 15151522 (2012)
[3] Yu, R.-L., Zhang, J.-C.: Speaker recognition method using MFCC and LPCC features.
Computer Engineering and Design 30(5), 11891191 (2009)
[4] Cemal, H., Erta, F.: Comparison of the impact of some Minkowski metrics on VQ/GMM
based speaker recognition. Computers & Electrical Engineering 37(1), 4156 (2011)
[5] Hesham, T.: A high-performance text-independent speaker identification of Arabic
speakers using a CHMM-based approach. Alexandria Engineering Journal 50(1), 4347