Speaker Recognition System Based On VQ in MATLAB Environment

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Speaker Recognition System Based on VQ in MATLAB

Environment

Yanxiang Geng, Guangyan Wang, Cheng Zhu, Teng Fei, and Xiaopei Liu

Information Engineering College


Tianjin University of Commerce
Tianjin, China
gengyanxiang@163.com

Abstract. Speaker recognition is a kind of biometrics technology, which is very


popular and widely applied. To improve the effectiveness and reliability of
recognition system, this paper combined two feature parameters, Mel
Frequency Cepstrum Coefficients (MFCC) and Linear Prediction Cepstrum
Coefficients (LPCC), to implemented a speaker identification system based on
Vector Quantization (VQ) method under MATLAB Environment. The
proposed system can carried out both text-independent and text-dependent
speaker recognition. Simulation results show that the better performance and
powerful functions of the proposed system.

Keywords: Speaker Recognition, VQ, LPCC, MFCC, MATLAB.

1 Introduction

Voice signal is the most important way of peoples communication. Voiceprint


recognition, generally called speaker recognition, which is a kind of biometrics
technology, has been rapidly developed in recent years and has broad application
foreground in many fields such as electric business and information security, etc.
Different from speech recognition, speaker recognition does not pay attention to the
text symbols and semantic content of information included in voice signal, but
focuses on how to extract speaker's personal characteristics of voice signal. Speaker
recognition contains two research items: speaker identification and speaker
verification.[1] The speaker recognition system can either be text-independent or text-
dependent depending on the application. In the text-independent case, there is no
restriction on the sentence or phrase to be spoken, whereas in the text-dependent case,
the input sentence or phrase is fixed for each speaker.
This paper is mainly concern on the designation and application of speaker
identification system. According to viewpoints of researchers, speaker identification
is a multi-interdisciplinary subject research topic which spans of applied physiology,
speech signal processing, pattern recognition and artificial intelligence, etc.[2] The
system performance mainly relies on two aspects: feature parameter extraction and
model training and recognition algorithms. Although there are many feature

J. Lei et al. (Eds.): AICI 2012, CCIS 315, pp. 494501, 2012.
Springer-Verlag Berlin Heidelberg 2012
Speaker Recognition System Based on VQ in MATLAB Environment 495

extraction parameters, such as Mel Frequency Cepstrum Coefficients (MFCC)[3],


Linear Prediction Cepstrum Coefficients (LPCC)[3] and other improved
parameters[2] have been adopted in the speaker recognition systems, the MFCC and
LPCC are the most classical and perfected parameters. Simultaneously, the training
and recognition algorithms of speaker recognition technology are variety. At present,
the popular methods are Vector Quantization (VQ)[4], Dynamic Time Warping
(DTW), Hidden Markov Model (HMM)[5], Gaussian Mixture Model (GMM)[4], etc.
In this paper, we select LPCC and MFCC as feature parameters to construct a speaker
identification system based on VQ method under MATLAB GUI environment. The
designed system can implement both text-dependent and text-independent speaker
identification, with have high identification rate and convenient operation.

2 VQ-Based Speaker Recognition System

Analyze identification algorithm of VQ by expounding by various parts of the


algorithm to understand VQ recognition system deeply. In VQ recognition
technology, every persons voice which is waiting for test is deemed as source.
Feature vector is extracted from the speaker's training speech sequence, and form
codebook representing every speakers characteristics by clustering method. In
identification process, codebooks are vector quantified in order to determine whether
its distribution is consistent with one codebook in feature space. After quantifying the
codebook, distortion distance is worked out to compare with the threshold value.
Finally the obtained result is confirmed. VQ-based speaker recognition system
schematic is shown in Fig. 1.

Speaker 1
Train ThresholdT
Feature extraction
i

Speaker N

Speaker 1
Test
Feature extraction
Decision

Speaker N Identification result

Fig. 1. VQ-based speaker recognition system schematic

VQ-based speaker recognition system consists of two modules: feature extraction


and feature matching.

2.1 Speaker Feature Extraction


The speech signal is time-varying, but in short period (usually within 5-10ms) its
basic characteristics are relatively stable. Thus, short-term spectral characteristics of
speech signal analysis is an important way. In this system, the sampling frequency is
8 kHz, and precision is 16bit.
496 Y. Geng et al.

2.1.1 Voice Feature Extraction


In this issue, improved endpoint detection algorithms use Mel Frequency Cepstrum
Coefficients (MFCC) and Linear Predictive Cepstral Coefficients (LPCC) as the
characteristic parameters and extract from speech signal, finally a series of sound
vector are obtained.

2.1.1.1 Linear Predictive Cepstral Coefficients. Linear prediction coefficients are


the basic parameters of linear prediction, which can transform into other parameters.
Assuming the system function of channel model obtained by linear prediction analysis
is as follow.

H (z ) =
1 1
=
A(z ) p
(1)
1 + a j z j
j=z

P is the LPC order of linear predictor. Let the impulse response is h ' ( n ) , cepstrum
is needed to calculate. According to the definition of cepstrum:

ln H (z ) = H ' (z ) = h' (n ) z n (2)
n=1

Obtain recurrence relation between h ' ( n ) and ak :


h' (0) = 0
h' (1) = a1

h' (n ) = + 1 k h' (n k ),1 n p
n 1
an ak (3)
k =1 n

h' (n ) = 1 k ak h' (n k ), n > p
p

n =1 n

P as order of generally takes 14 which is less than the number of points for one frame
of voice sample. This number is symbol, so LPCC only represents former P values in
h ' ( n ) (n = 1,2, ...,).
The result of testing LPCC voice parameters extracted from Li Ting is shown
in Fig.2.

Fig. 2. LPCC parameters figure


Speaker Recognition System Based on VQ in MATLAB Environment 497

In practice, in order to obtain better recognition results, LPCC often need post-
processes. For example, each component of cepstral coefficients multiply a
appropriate weighting factor, or seek a first-order and second-order differential on the
basis of the current LPCC.

Mel Frequency Cepstrum Coefficients (MFCC)


Another voice parameter is based on the characteristic parameters of humans
hearing: MFCC. Different from the actual frequency cepstrum analysis, MFCC
analysis focuses on humans auditory. Because the sound level and frequency are not
linearly proportional, Mel frequency scale is more suitable for human's auditory
characteristics. The relationship between Mel frequency specific and the actual
frequency can be shown as equation (4):

f
Mel ( f ) = 2595lg 1 + (4)
700

MFCC calculation process:


1) Time-domain signal X (n ) of each voice frame can be obtain by pre-emphasis,
framing to sequence S (n ) of each frame, calculating discrete spectrum X (k ) from
Discrete Fourier Transform (DFT).
N 1 j 2 nk
X (k ) = x(n )e N ,0 kN (5)
n =0

X (n ) is the input speech signal, N represents the points of the Fourier transform.
2) The discrete spectrum X (k ) in the equation (5) is produced by Mel frequency
filter to getting logarithm spectrum S (m).
3) The logarithm of each filter output energy:

N 1
S (m ) = ln X (k ) H m (k ), (0 m M )
2
(6)
k =0

4) Use Discrete Cosine Transform (DCT) to get MFCC coefficients:

1
M 1 n m +
C (n ) = S (m )cos 2
, (0 m M )
M
(7 )
m=0


5) MFCC feature directly get from equation 7 is static. Since MFCC mainly
reflects static characteristics of speech signal, dynamic characteristics can be
converted into static ones by first and second order differences.
The result of testing MFCC voice parameters extracted from Li Ting is shown
in Fig.3.
498 Y. Geng et al.

Fig. 3. MFCC parameters figure

2.2 Feature Matching

2.2.1 The Training Process


In training phase, assuming the set of training speech feature vector from NO.i
speaker's are ( ) = Y ,Y , ,Y Y i { (i )
, the speaker's codebook is(i ) (i )
}
{C C } designed by LBG algorithm.
1 2 T

C( i ) =
(i )

1
,
(i )

2
, , C (i )
If there are N individuals
( ) (i = 1, 2, ..., N )are needed in the training stage.
M

in the system, N codebooks C i

2.2.2 Identification
N 1
process S (m ) = ln X (k ) H m (k ), (0 m M )
2

k =0
1) Sequence of feature vector is extracted from the test speech X 1 , X 2 ,..., X M
2) Each template victories sequence of feature vectors, calculating average
quantitative error:

Di =
M n= 1 min
1 M
1l L [D (X n , Y il )] (8)

where, Yl
i
l=1,2,L; i=1,2,
N: the NO.L code vector in NO.i codebook,
( )
d X n , Yl i :the distance between test vector X n and code vector.
3) Finally, the person whose codebook with the smallest average quantization error
is chosen as recognition results.

3 Speaker Recognition System in MATLAB

3.1 The Establishment of Speech Database


In this experiment, speech database is established by recording voice data of 22
samples (12 men, 10 women). Every participant read a long Chinese sentence and a
Speaker Recognition System Based on VQ in MATLAB Environment 499

phrase as well as their own name in mid-speed. Long sentence is "School of


Information Engineering, Tianjin University of Commerce, speaker recognition".
The phrase is "everything is possible". Duration time of sentence is 2.5s, and that of
phrase is 1s. Everyone Repeat 2 times through recording software of Windows XP
and a microphone. The format of recording is PCM data, 8 kHz sampling rate and
16bit sampling accuracy. And the recording is saved as wave files in the sound card.
The system is designed by powerful GUI(Graphical User Interface) of Matlab,
which Users can easily operate. VQ-based voice recognition system is shown
in Fig. 4.

Fig. 4. Interface design voiceprint recognition system based on VQ

The recognition system can realize both correlated and uncorrelated identification
for speaker. Characteristic parameters through the selection and training modules not
only achieve single speech recognition, but also realize multi-person speech
recognition.

3.2 Training Module


First of all, voice type is selected from a existing speech-database, (Including
unrelated long sentences, phrases and isolated words). Then select voice button to
achieve the model training of voice characteristic parameters. Through training and
learning, reference model codebook for training voice of each speaker is build.
When Training voice is completed, the system will pop up hint window (as Fig. 5).

Fig. 5. Prompt dialog box


500 Y. Geng et al.

3.3 Recognition Module

This module contains two sub-modules: recognition module for single-person and
multi-person.

3.3.1 Single Identification


After testing voice is loaded, the characteristics of speaker's voice are extracted and
compared with the previously established reference model code by training. The
person whose voice characteristics can be identified from voice library will be
showed on the screen (see Fig.6).

Fig. 6. Single- person recognition results show box

3.3.2 Multiple Identification


Through feature extraction and pattern matching, the test of 22 voice samples and 22
references in voice database are one to one correspondence. Recognition results will
be saved as text-format file (as Fig.7).

Fig. 7. multi- person recognition results show box


Speaker Recognition System Based on VQ in MATLAB Environment 501

4 Conclusions

This paper makes of vector quantization identification algorithm to extract and match
speakers features. Thus, speakers recognition is achieved. By using LPCC and
MFCC as the improved extraction-parameters of features, speaker recognition system
based on VQ can identify the speaker fast, effectively and accurately, also proved the
efficiency by MATLAB simulation.

Acknowledgments. This work was supported by a grant from the University Science
and Technology Development Fund of Tianjin (No. 20080710).

References
[1] Ramachandran Ravi, P., Farrell, K.R., Ramachandran, R., Mammone, R.J.: Speaker
recognitiongeneral classifier approaches and data fusion methods. Pattern
Recognition 35(12), 28012821 (2002)
[2] Li, Z., Yang, Q.: Speaker Recognition System Based on weighted feature parameter.
Physics Procedia 25(0), 15151522 (2012)
[3] Yu, R.-L., Zhang, J.-C.: Speaker recognition method using MFCC and LPCC features.
Computer Engineering and Design 30(5), 11891191 (2009)
[4] Cemal, H., Erta, F.: Comparison of the impact of some Minkowski metrics on VQ/GMM
based speaker recognition. Computers & Electrical Engineering 37(1), 4156 (2011)
[5] Hesham, T.: A high-performance text-independent speaker identification of Arabic
speakers using a CHMM-based approach. Alexandria Engineering Journal 50(1), 4347
(2011)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy