0% found this document useful (0 votes)

39 views5 pages

Reconocimiento de Voz - MATLAB

This document describes the design of a real-time speech recognition system using MATLAB. It discusses: 1. The system was trained by recording 9 voice samples and extracting MFCC features to distinguish words based on associated energies. 2. The development process involved a training stage of recording, feature extraction and system training, and a testing stage of real-time text conversion. 3. In the training stage, speech samples were recorded and stored as .wav files, then MFCC features were extracted to create a database for the system to train on. Spectrums of the samples were analyzed to separate words.

Uploaded by

Julito

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views5 pages

Reconocimiento de Voz - MATLAB

Uploaded by

Julito

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

International Journal of Computer Applications (0975 – 8887)

National Conference on Latest Initiatives& Innovations in Communication and Electronics (IICE 2016)

Designing a Real Time Speech Recognition System

using MATLAB

Neha Sharma Shipra Sardana

Student ME-EC Assistant Professor
Department of Electronics & Communications, Department of Electronics & Communications,
Chandigarh University, Mohali, Chandigarh, India Chandigarh University, Mohali, Chandigarh, India

ABSTRACT using such techniques new spectrum is obtained that is

Real time speech to text conversion system introduces different from the previous spectrum of spoken words.
conversion of the uttered words instantly after the utterance.
This paper introduces a unique way of interaction of human
2. DESIGN AND DEVELOPMENT OF
and a computer through the specific way ofnatural language OUR SPEECH RECOGNITION
processing which is basically a speech recognition system. In SYSTEM
this paper nine voice samples were recorded through a The development of our speech recognition system is divided
microphone and the system was trained according to the in two stages, first is training stage and second is the testing
recoded voice samples. MFCC features of speech sample stage.
were calculated and words were distinguished according to
the energies associated with each sampled word. This system Training stage
provides a high accuracy in case of text conversion.
Recording of Extract MFCC Train the
Keywords Speech samples features then store system
Samples time the voice for a
Speech Recognition System period specific
1. INTRODUCTION
Speech recognition is an important application of Natural
Language Processing (NLP). Speech is the most important Calculating Dividing speech Take real
part of communication. We express our ideas through a Energy of samples into frames time
specific language.Computers understand our language (natural Each frame Speech input
language) by speech recognition. Speech or word by word
recognition is the process of automatically extracting and
determining linguistic information conveyed by a speech
wave using computers. Linguistic information, the most Analyzing the Feature extraction Real Time
important information in a speech wave, is called phonetic Frame energy to and classification Text
information. The term speech recognition means the Separate words with the
prerecorded output
recognizing the spoken words only. However, the recognition
system has no idea what those words mean. It only knows that
they are words and what words they are. To be of any use, Figure 2.1: Speech recognition process
these words must be passed on to higher-level software for
syntactic and semantic analysis. It is a technique of pattern 3. TRAINING STAGE
recognition, where acoustic signals are tested and framed into In this stage a database is created by recording some speech
phonetics (number of words, phrases and sentences)[1].To samples by the user. Then the recorded speech samples are
perform such task one needs to record a voice sample and stored into .wav format in Matlab. After this stage it is
then convert this voice sample into .wav format. Spectrum necessary to train the speech recognition system.
based parameters are obtained when a word is recognized.
Near about twenty four parameters can be obtained in the 3.1 Feature Extraction
analysis of spectrum of speech signal. Feature extraction converts the speech waveform into some
parametric information for further analysis and processing.
These parameters are mean, median, standard deviation(STD), This is often referred as the signal-processing front end. The
root mean square(RMS), maximum peak, minimum peak, speech signal is a slowly time varying signal.When examined
slope of the maximum peak, width of maximum peak, signal over a sufficiently short period of time, its characteristics are
to noise ratio, peak frequency, peak amplitude, total power, fairly stationary. However, over long period of time (on the
total harmonic distortion(TDH), total harmonic order of 1/5 seconds or more) the signal characteristic change
distortion(TDH)+noise, inter modulation distortion(IMD) etc. to reflect the different speech sounds being spoken. Therefore,
Various statistical methods are used for the analysis of words short-time spectral analysis is the most common way to
which give some specific value of words. Words fluctuate characterize the speech signal. For that use MFCC features are
between in its bounded range of occurrence. In the used.
improvement of word recognition process, one of the
important tasks is to find the most informative parameters of 3.1.1 Mel Frequency Cepstral Coefficients
speech signal. To perform such tasks some of the techniques (MFCCS)
are used namely, Linear Predicted coding coefficients(LPC) We used MFCC features for this system.The word ‘Mel’ in
and Mel Frequency Cepstrum Coefficients(MFCC)[2]. By the MFCCs represents the melody of a speech signal. MFCC
features are based on the human ear perception which means

1
International Journal of Computer Applications (0975 – 8887)
National Conference on Latest Initiatives& Innovations in Communication and Electronics (IICE 2016)

human’s ear’s critical bandwidth frequencies filters the spaced

linearity between the high frequency and low frequency of the
speech signal and capture the useful information of that
particular signal. The human perception for the frequency
contents of the speech signals follows a nonlinear scale.
That’s why pitch is measured on a scale which is actually a
Mel scale.TheMel-frequency scale is linear frequency spacing
below 1000 Hz and a logarithmic spacing above 1000 Hz [3].
MFCCs are calculated as follows:
Mel (f) = 2595*log10 (1+ f / 700) (1)

3.2 CREATING THE DATABASE

To recognize the uttered word of the speaker, a database is
created to resemble the pronounced word. To create such
database, we first recorded some numerals from one to nine
and achieved following plots:
Figure: 3.2.3: Spectrum of ‘three’

Figure 3.2.4: Spectrum of ‘four’

Figure 3.2.1: spectrum of ‘one’

Figure 3.2.5: Spectrum of ‘five’

Figure 3.2.2: spectrum of ‘two’

2
International Journal of Computer Applications (0975 – 8887)
National Conference on Latest Initiatives& Innovations in Communication and Electronics (IICE 2016)

Figure3.2.9: spectrum of ‘nine’

Figure3.2.6: spectrum of ‘six’
3.3 TRANING OF THE VOICE SAMPLES
Speech recognition system is trained before use. This training
of the speech samples is a necessary part of the speech
recognition system. We have trained our speech samples at
sampling frequency 8khz.
The duration of the training can be varied from 20s.After the
training of the speech samples the system will separate the
frames of speech signal with high energy and the speech
signal with low energy. As the figure:3.3 show the training
sequence of speech samples. The plot of training of voice
samples:

Figure3.2.7: spectrum of ‘seven’

Figure 3.3: Training of the speech samples

4. EXPERIMENTAL TESTING
Our speech recognition system was a speaker dependent
system. So it was dependent on the user’s voice only. In the
training of this system we created a database of nine words.
After the training of this system, a real time speech input was
Figure3.2.8: Spectrum of ‘Eight’ given to it through a good quality microphone. The system
divided the real time speech sample into small segments of
frames or continuous groups of samples. After that the energy
of each frame segment was calculated using simple energy
formula:

Ex= (2)
Energy calculated was then analyzed by a speech detection
algorithm to separate the words.

3
International Journal of Computer Applications (0975 – 8887)
National Conference on Latest Initiatives& Innovations in Communication and Electronics (IICE 2016)

4.1 SPEECH DETECTION ALGORITM human vocal cord and different sounds can have different
The speech detection algorithm was developed by processing frequencies. To predict the different frequencies it power
the prerecorded speech samples frame by frame within a spectral density measure can be a better way. So we find out
simple loop. We divided the whole frame into the segment of the frequencies by power spectral densities measure.
160 samples and each of the samples was detected by the Speech can be termed as short term stationary so MFCC
system. For the detection of each frame we used a features were again extracted and words pronounced by the
combination of signal energy and a zero crossing rate. This user were detected.
calculation became very simple with the MATLAB
mathematical and logical operators. 5. RESULTS
Real time results were obtained in the lab. The user was
4.2 ACOUSTICAL MODEL speaking through the microphone and the text representation
It is very important to create an acoustical model for the was obtained on the computer screen as shown in the figure
detection of each uttered word. So we created an acoustical 5.1. Implementation results of speech to text conversion
model. It is known that different sounds are produced by system are as follows:

Figure 5.1: STT conversion of four and seven

Figure 5.2: STT Conversion of Eight and Nine

4
International Journal of Computer Applications (0975 – 8887)
National Conference on Latest Initiatives& Innovations in Communication and Electronics (IICE 2016)

Figure 5.3: STT Conversion of Seven and Eight

6. CONCLUSION Computer Interaction”InternationalJournal of Computer

In this project nine words were collected and analyzed. Words ApplicationsVolume- 12, pp.0975 – 8887, November
were distinguished by energies associated with them. The 2010.
system was able to separate the words according to their [4] Jeong, S., Hahn, M.: “Speech quality and recognition rate
energies. Final output comes out in the form of text. By using improvement in car noiseenvironments”, Electron.
this code the system can be trained for more words and Lett.,37, (12),pp. 800–802,2001.
paragraphs. Every word parameter has their bounded values in
whichthat parameter varies. Each word has some specific [5] Ma, J., Deng, L.: “Efficient decoding strategies for
range of these parameters. Some words are same but they still conversational speech recognition using a constrained
have some same parameters which tell us about the word. e.g. nonlinear state-space model”, IEEE Trans.Speech Audio
in speech seven and one. Words are similar and have some Process., 11, (6), pp. 590–602, 2003.
parameters which are same. Here the word seven contains one [6] RohitRanchal, , Teresa Taber-Doughty, YirenGuo, Keith
at the end. So it sounds same sometimes, and the system gives Bain,Heather Martin, J. Paul Robinso, and Bradley S.
output ‘one’ when seven is pronounced sometimes. Such type Duerstock, “Using Speech Recognition for Real-Time
of ambiguities can be removed with large number of samples Captioning and Lecture Transcription in the Classroom”,
taken for one particular word. IEEE Transactions On Learning Technologies, Vol. 6,
This system is also very sensitive to noise. In future we can No. 4, October-December 2013.
work for this task. Also this system is very sensitive to word [7] Daryl Ning, “Developing an isolated word recognition
pronunciation during training. Words that we have recorded to system in MATLAB”, The Mathworks, Inc. 2009.
create a database and the words during training should be
pronounced similarly. System is sensitive to tone of [8] Deepak Baby, Tuomas Virtanen, Jort F. Gemmeke, Hugo
pronunciation. van hamme, “Coupled dictionaries for Exampler- Based
Speech Enhancement and Automatic Speech
7. REFERENCES Recognition”, IEEE/ACM Tans. On Audio, speech and
[1] J. D. Tardelli, C. M. Walter, “Speech waveform analysis Language processing, vol. 23, No. 11, 2015.
and recognition process based on non-Euclidean error
minimization and matrix array processing techniques”. [9] Naoki Hirayama, Koichiro Yoshino, KatsutoshiItoyama,
IEEE ICASSP, pp. 1237-1240, 1986. Shinsuke Mori, and Hiroshi G. Okuno, “Automatic
Speech Recognition for Mixed Dialect Utterances by
[2] Takao Suzuki, Yasuo Shoji, “A new speech processing Mixing Dialect Language Models”, IEEE/ACM
scheme for ATM switching systems”. IEEE, Digital transactions on Audio, Speech, And Language
Communications Laboratories, Oki Electric Industry Co. Processing, vol. 23, no. 2, february 2015
Ltd., Japan, pp. 1515-1519, 1989.
[10] Shaila D. Apte, “speech and audio processing”, Wiley
[3] Siva PrasadNandyala, Dr.T.Kishore Kumar,“Real Time India, 2013
Isolated Word Speech Recognition System for Human

IJCATM : www.ijcaonline.org 5

AI Speech Recognition Document
No ratings yet
AI Speech Recognition Document
26 pages
Speaker Recognition Using MATLAB
95% (64)
Speaker Recognition Using MATLAB
75 pages
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
No ratings yet
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
5 pages
Unit 5 (Automatic Speech Recognition)
No ratings yet
Unit 5 (Automatic Speech Recognition)
13 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Effect of MFCC Based Features For Speech Signal Alignments
No ratings yet
Effect of MFCC Based Features For Speech Signal Alignments
7 pages
Lecture 1
No ratings yet
Lecture 1
48 pages
Automatic Speech Recognition 2
No ratings yet
Automatic Speech Recognition 2
22 pages
Lecture 7 - Automatic Speech Recognition
No ratings yet
Lecture 7 - Automatic Speech Recognition
58 pages
Feature Extraction Using PCA
No ratings yet
Feature Extraction Using PCA
36 pages
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
No ratings yet
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
5 pages
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
No ratings yet
Speech Recognition Using Matrix Comparison: Vishnupriya Gupta
3 pages
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
No ratings yet
Write: Get Unlimited Access To The Best of Medium For Less Than $1/week
19 pages
Intechopen 80419
No ratings yet
Intechopen 80419
18 pages
A Study On Automatic Speech Recognition
100% (1)
A Study On Automatic Speech Recognition
2 pages
Digital Signal Processing "Speech Recognition": Paper Presentation On
No ratings yet
Digital Signal Processing "Speech Recognition": Paper Presentation On
12 pages
Seminar Presentation: Topic: Speech Recognition
No ratings yet
Seminar Presentation: Topic: Speech Recognition
26 pages
Ann LA2 Project
No ratings yet
Ann LA2 Project
23 pages
Audio Signal Processing Audio Signal Processing
No ratings yet
Audio Signal Processing Audio Signal Processing
31 pages
06838564 (1)
No ratings yet
06838564 (1)
5 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
Biometric Voice Recognition
No ratings yet
Biometric Voice Recognition
33 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
45 pages
ICCSEE.2012.359
No ratings yet
ICCSEE.2012.359
4 pages
M FCC Review
No ratings yet
M FCC Review
10 pages
Effect of MFCC Based Features For Speech Signal Alignments
No ratings yet
Effect of MFCC Based Features For Speech Signal Alignments
7 pages
Isolated Speech Recognition Using Artificial Neural Networks
No ratings yet
Isolated Speech Recognition Using Artificial Neural Networks
5 pages
Speech Recognition Project
No ratings yet
Speech Recognition Project
33 pages
Major Project
No ratings yet
Major Project
22 pages
Speech Recognition UTHM
No ratings yet
Speech Recognition UTHM
30 pages
Speech Recognition Using Matlab: Objective
No ratings yet
Speech Recognition Using Matlab: Objective
2 pages
MathLab Based Speech Processing
No ratings yet
MathLab Based Speech Processing
8 pages
Speech Recognition1
No ratings yet
Speech Recognition1
24 pages
IRJET Speech Scribd
No ratings yet
IRJET Speech Scribd
3 pages
Ijreas Volume 3, Issue 3 (March 2013) ISSN: 2249-3905 Efficient Speech Recognition Using Correlation Method
No ratings yet
Ijreas Volume 3, Issue 3 (March 2013) ISSN: 2249-3905 Efficient Speech Recognition Using Correlation Method
9 pages
Term Paper ECE-300 Topic: - Speech Recognition
No ratings yet
Term Paper ECE-300 Topic: - Speech Recognition
14 pages
EE264 Final Project Report: Echai@stanford - Edu
No ratings yet
EE264 Final Project Report: Echai@stanford - Edu
17 pages
Implementation of Speech Recognition Using Artificial Neural Networks
No ratings yet
Implementation of Speech Recognition Using Artificial Neural Networks
12 pages
A Voice Identification System Using Hidden Markov Model
No ratings yet
A Voice Identification System Using Hidden Markov Model
6 pages
Isolated Digit Recognition System
100% (1)
Isolated Digit Recognition System
3 pages
Speaker Recognition Using Mel Frequency Cepstral Coefficients (MFCC) and Vector
No ratings yet
Speaker Recognition Using Mel Frequency Cepstral Coefficients (MFCC) and Vector
4 pages
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
No ratings yet
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
6 pages
Hilti updated
No ratings yet
Hilti updated
20 pages
Speech Recognition Using Discrete Hidden Markov Model: Department of ECE, Saveetha Engineering College, Chennai, India
No ratings yet
Speech Recognition Using Discrete Hidden Markov Model: Department of ECE, Saveetha Engineering College, Chennai, India
6 pages
Voice Recognition Based Security System Using
No ratings yet
Voice Recognition Based Security System Using
6 pages
Spoken Language Identification Using Hybrid Feature Extraction Methods
No ratings yet
Spoken Language Identification Using Hybrid Feature Extraction Methods
5 pages
Feature Extraction Methods LPC, PLP and MFCC
100% (1)
Feature Extraction Methods LPC, PLP and MFCC
5 pages
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
No ratings yet
Speech Feature Extraction and Classification Techniques: Kamakshi and Sumanlata Gautam
3 pages
SPEECH RECOGNITION SYSTEM
No ratings yet
SPEECH RECOGNITION SYSTEM
5 pages
Randy Smith - Design Fundamentals of Stealth Gameplay in Thief
No ratings yet
Randy Smith - Design Fundamentals of Stealth Gameplay in Thief
76 pages
(PDF Download) Applications of Percolation Theory 2nd Edition Sahimi M. Fulll Chapter
100% (10)
(PDF Download) Applications of Percolation Theory 2nd Edition Sahimi M. Fulll Chapter
49 pages
A Review On Feature Extraction and Noise Reduction Technique
No ratings yet
A Review On Feature Extraction and Noise Reduction Technique
5 pages
Dimensional Analysis - Wikipedia
No ratings yet
Dimensional Analysis - Wikipedia
75 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
Mechanics of Machines
No ratings yet
Mechanics of Machines
40 pages
AP Physics Inquiry Based Lab Manual
100% (1)
AP Physics Inquiry Based Lab Manual
348 pages
Annex - 6 - Bare Overhead Line Conductor
100% (1)
Annex - 6 - Bare Overhead Line Conductor
12 pages
DMR Es10
No ratings yet
DMR Es10
71 pages
Die Specification Cold Chamber: Aluminium
100% (1)
Die Specification Cold Chamber: Aluminium
1 page
SUPRAL - S60 Per Blocchi (Equivalente RESISTAL S60) PDF
No ratings yet
SUPRAL - S60 Per Blocchi (Equivalente RESISTAL S60) PDF
1 page
Thermal Insulation of Walls and Roofs by PCM: Modeling and Experimental Validation
No ratings yet
Thermal Insulation of Walls and Roofs by PCM: Modeling and Experimental Validation
11 pages
Atomic Physics and Quantum Mechanics
No ratings yet
Atomic Physics and Quantum Mechanics
13 pages
Properties of Refractory Material: Introduction
No ratings yet
Properties of Refractory Material: Introduction
13 pages
W6201
No ratings yet
W6201
16 pages
CO2 Recovery Technology
100% (1)
CO2 Recovery Technology
12 pages
Improving GROMACS Trajectory Analysis
No ratings yet
Improving GROMACS Trajectory Analysis
13 pages
Digital Finishing... ... Made Affordable!: Features & Specifications Guide For Multicam Digital Express
No ratings yet
Digital Finishing... ... Made Affordable!: Features & Specifications Guide For Multicam Digital Express
9 pages
Senior Engineer Qualification
No ratings yet
Senior Engineer Qualification
10 pages
Digital Communication Matlab Assignment No
No ratings yet
Digital Communication Matlab Assignment No
9 pages
Haldia Institute of Technology
No ratings yet
Haldia Institute of Technology
8 pages
Preliminary Chemistry: Stathfield Girls High School Year 11 Yearly Exam 2004
No ratings yet
Preliminary Chemistry: Stathfield Girls High School Year 11 Yearly Exam 2004
15 pages
Cara Menggunakan Refractometer
No ratings yet
Cara Menggunakan Refractometer
4 pages
Conclusion
No ratings yet
Conclusion
2 pages
Exercise - 1: Atomic Structure
No ratings yet
Exercise - 1: Atomic Structure
3 pages
Small-Size Comb-Line Microstrip Narrow BPF
No ratings yet
Small-Size Comb-Line Microstrip Narrow BPF
4 pages
Grade 8 Science - Table of Contents
No ratings yet
Grade 8 Science - Table of Contents
3 pages
Charles 4
No ratings yet
Charles 4
3 pages
Digital Manometer PM 9100HA (Lutron)
No ratings yet
Digital Manometer PM 9100HA (Lutron)
2 pages
Energy Conservation and Waste Heat Recovery: Course Layout
No ratings yet
Energy Conservation and Waste Heat Recovery: Course Layout
1 page
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
From Everand
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Dragon's Breath: Mastering Voice Recognition in the Digital Age
From Everand
Dragon's Breath: Mastering Voice Recognition in the Digital Age
Pasquale De Marco
No ratings yet
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Silent Speech Interface: Fundamentals and Applications
From Everand
Silent Speech Interface: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Reconocimiento de Voz - MATLAB

Uploaded by

Reconocimiento de Voz - MATLAB

Uploaded by

International Journal of Computer Applications (0975 – 8887)

Designing a Real Time Speech Recognition System

Neha Sharma Shipra Sardana

ABSTRACT using such techniques new spectrum is obtained that is

human’s ear’s critical bandwidth frequencies filters the spaced

3.2 CREATING THE DATABASE

Figure 3.2.4: Spectrum of ‘four’

Figure 3.2.5: Spectrum of ‘five’

Figure 3.2.2: spectrum of ‘two’

Figure3.2.9: spectrum of ‘nine’

Figure3.2.7: spectrum of ‘seven’

Figure 3.3: Training of the speech samples

Figure 5.1: STT conversion of four and seven

Figure 5.2: STT Conversion of Eight and Nine

Figure 5.3: STT Conversion of Seven and Eight

6. CONCLUSION Computer Interaction”InternationalJournal of Computer

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.