Speech Recognition Using Python
Speech Recognition Using Python
Speech Recognition Using Python
On
SPEECH RECOGNITION USING PYTHON
Project report submitted In partial fulfillment of the requirement for the degree of
Bachelor of Technology
In
Computer Science and Engineering Information Technology
By
This is to certify that the work in this Project title as “AN IMPLEMENTATION OF
SPEECH RECOGNITION” is entirely written, successfully completed and
demonstrated by the following students themselves as a fulfillment of requirement for
Bachelor’s of Engineering in Computer Science
Sign:
Sign:
Student Name: Akshit Chaudhary
Roll no: 171271
This is to certify that the above statement made by the candidate is true
to the best of my knowledge.
TABLE OF CONTENTS
Chapter Page
1. Introduction…………………………………………………………8-12
1.1 Introduction…………………………………………………………...8-10
1.2 Objectives……………………………………………………………..11
1.3 Problem Statement…………………………………………………..12
1.4 Methodology…………………………………………………………..11-12
1.5 Scope…………………………………………………………………..13
2. Literature Review…………………………………………………….14-20
2.1 History………………………………………………………………….14-15
2.2 Analysis………………………………………………………………...16-17
2.3 Speech Recognition type……………………………………………..18-20
2.4 Application………………………………………………………………20
3. System Development………………………………………………….24-33
3.1 Speech synthesis……………………………………………………….24-25
3.2 Packages used and Implementation ……………………………….24-25
3.2.1 Working with Audio files……………………………………………..29
3.2.2 Effect of Noise………………………………………………………..29
3.2.3 Speech Recognition Using Microphone…………………………….29
3.2.4 Guess the Fruit Game………………………………………………..29
3.2.4.1 working…………………………………………………………..
3.3 System Development Approach……………………………………….28-33
3.3.2 ACTIVITY DIAGRAM……………………………………………..29
3.3.3 CLASS DIAGRAM………………………………………………...30
3.3.4 SEQUENCE DIAGRAM…………………………………………..31
4. Performance Analysis…………………………………………………34-40
4.1 System Requirement…………………………………………………...34
4.1.1 Minimum requirement……………………………………………34
4.1.2 Best requirement…………………………………………………34
4.2 Hardware requirement………………………………………………….35
4.3 Web search using Speech…………………………………………...37-39
4.4 Graphical Representation………………………………………………40
5 Conclusion…………………………………………………………………42-44
5.1 Advantages of software………………………………………………….42
5.2 Disadvantages……………………………………………………………..43
5.3 Conclusion………………………………………………………………….44
References……………………………………………………………………..44
LIST OF GRAPH
LIST OF FIGURES
FIG# Topic Page#
1.1 Unified Framework 9
2.3 Speech Recognition process 21
3.2 Activity Diagram 29
3.3 Class Diagram 30
3.4 Sequence Diagram 31
4.2.1 Program for Relative search 38
4.2.2 Recognize The Word 39
ABSTRACT
This project is designed and developed keeping that factor into mind,
and a little effort to achieve this aim. Our project is capable to recognize
the speech and convert into text.
CHAPTER 1
1.1 INTRODUCTION
Speech technology with fields within the scope of the paper are to be
presented in Fig. as the unified framework that encompasses covered
topics, showing their complementarity, ranges and borders,
interconnections, and intersections in the interdisciplinary area of
Speech.
Unified framework
Fig: 1.1 Unified Framework
In mostly areas of the country, there are lot of people who don’t
know how to write and also how to read any word, so this project is
very helpful for these type of people as you know in today’s world
Everybody has its own mobile phones and they want to search a
lot of things. In this project, they usually speak what they want to
search and various results of such type opens in the browser
window.
In this project, we made our machine recognise the speech passed
as the audio file as well as the disection of the speech basis on the
requirement .
Our Aim is to make the search fast and efficient and also reliable
for every person by implementing basic search commands and
also correct their vocabulary easily and also further implementing
speaking mode like Siri in iphone’s.
1.2 Objectives
1.4 Methodology:
1.5 Scope
(LITERATURE REVIEW)
2.1 HISTORY
For almost of all the Decade There aren’t a lot of Advancements till
google has come with a start of google search voice.
This was also Significant because that the processing power would be
offloaded to its data Centres.
Not only for that, Google Application was collecting data from many
billions of the searches which could help this to predict what a human is
actually Saying.
That time Google’s English voice search system, included 240 billion
words from user searches.
2010s
The early part of the decade saw an explosion of the other voice
Recognition Applications.
The Future
We wouldn’t do this without the proper Work that has to been done
before we.
Analysis:
Speaking Mode:
Basically it means that how the words are been spoken as in connected
or in isolated. In Isolated word of speech Recognition System needs that
speaker take pause between the words he speak. It means single kind
word In connected word of speech recognition system did not need that
the speaker take pause briefly in between the words. It generally means
full length sentences in which words are then artificially keep away by
silence.
Speaking Style:
Generally it Includes whether that the speech is in continuous form of
spontaneous form. Continuous form is that spoken in natural form.
Systems are to evaluated on speech read from the scripts that are
prepared where as in spontaneous or extemporaneously generated,
speech does not contain fluencies, and it is also difficult to figure out that
speech read from the written script. It is also vastly much more hard as it
tends to be peppered with unfluency like “uuh” and “uum”, no full
sentaces, spluttering , stuttering, sneezing , cough, and also vocabulary
is essentially ulimited, So there must be training to system to be able to
tackle with unknown and hidden words.
Vocabulary :
IT is much simple to discriminate a smaller set of the words, but rate of
error incareses as the size of the vocabulary increases.
For ex: 10 digits start from 0 -9 can easily be recognised rightly on the
other side vocabulary whose size is 100 , 4000 or 15000 have the rate of
error as 3%, 6%, 40% . The vocabulary is hard to predict or recognize if
it contains Confused kind of words.
Enrollment:
This is kind of 2 ways
1)Speaker Dependent 2) speaker independent
In speaker dependent the user must be providing various samples of her
or his speech before they’re used, a speaker dependent system is
meant for use of only single kind speaker , where as speaker
independent system is allowed or intended to use any type or kind of
speaker
Fig: 2.3 Speech Recognition Process
2.4 APPLICATION
(SYSTEM DEVELOPMENT)
GRAPH 3.1
3. import web browser: with this package we can make use of our
default browser used to locate, retrieve and display data .The
URL and the query is passed to the instance of the webbrowser
package and basis on the url provided and the query the particular
webpage opens.
Recognizer Class : All the major process for speech recognition occurs
in the recognizer class. As the main function or purpose of
a recognizer instance is that to recognize speech and it provides with
the various processes and functions which furthur helps in recognising
speech from audio source.
The path of the audio file can be passed as the argument to the
AudioFile class and it also provides with the context manager as it helps
in reading and working with the file material.
The context manager then is responsible for opening of the audio file
and finally stores the data of file in the instance of the AudioFile.Then
the record() method is used to store the data from the entire audio file
and initialize it into the instance of AudioData.
All of the audio recordings consists of some level of noise in them & the
unhandled noise can greatly reduce the accuracy of the speech
recognition apps.
This file has the phrase “smell during periods” spoken with a loud sound
in the background. Thus the speech cannot be recognised properly.
Input For The noisy audio file:
The Pyaudio is installed to access the microphone which helps the user
for real -time speech recognition.With instance of speech_recognition
the microphone can be used.
If the user guesses the fruit name correctly then the game announces
the win else it prints the message to try again if their are any attempts
remaining.
3.2.4.1 Working :
This function first checks the correctness of both the arguments and
produces a Type Error if anyone of them is invalid.
Then listen method is used to liste to the input from the mic.
The first for loop of the program runs for the number of guesses
provided to the user.the othe for loop inside the first for loop attempts to
recognise the input each time from the recognise_speech_mic() function
which stores the dictonary returned from this function and stores it in an
variable
If the system recognises the word spoken by the user .I.e the
transcription key is not null and the speech of user is transcribed and the
inner loop breaks out and if the speech is not transcribed and the API
error occured then also the loop breaks out and if the API request
becomes succesful but the speech was not recognised then the else
statement is executed which tells the user to again speak the word.
If the inner loop breaks out without any errors then the returnedd
dicitionary is correct the errors if the error occured then the error
message is displayed and which ends the program.
If no error occur on the breaking of the inner loop then the inscription is
cmatched is compared to the word selected by system and to lower()
method is used to convert string into lowercases which reduces the
possiblity of wrong answer because of the conversion of the speech to
upper cases .
If user makes a correct guess which matches with the system’s guess
then the user win the game else the outer loop executes on the basis of
the attempts left and finally if user fails in last attempt then user loses the
game.
Output For Guess The Fruit Using Speech Recognition Through Microphone:
3.3 System Development Approach :
3.3.1 ACTIVITY DIAGRAM:
(PERFORMANCE ANALYSIS)
4.1.1 Requirements:
a. 1.6 MHz Processor
b. 128 MB RAM
c. Microphones for good audio.
Sound cards:
The proper driver must be installed for the sound as speech
requires low Bandwidth thus high quality of sound cards are to
be used.
Microphones:
Microphones are the most important tools for the real
time speech to text conversion .Therefore the pre-installed
ones cannot be used as they are more prone to the
background noise and also of poor quality in terms of speech.
Computer Processor:
Speech recognition application depends majorly on
processing speed. The input from the user can take some time
if the processing speed is low and thus user wasted more time
on waiting compared to performing the task which makes the
application less feasible for use.
This system is designed to recognize the speech and also has the
capabilities to convert speech to text. This software name ‘SPEECH
RECOGNITION SYSTEM’ has the capability to write spoken words into
text.
GRAPH 4.1
(SOURCE: MICRosoft)
CHAPTER 5
(CONCLUSION)
5.1 Advantages of Software:
In mostly areas of the country, there are lot of people
who don’t know how to write and also how to read any word, so
this project is very helpful for these type of people as you know in
today’s world, everybody has its own mobile phones and they want
to search a lot of things. In this project, they usually speak what
they want to search and various results of such type opens in the
browser window.
5.2 Disadvantages:
1. Low accuracy because of its limited ability.
2. Fails in noisy environment.
3. Depends majorly on GoogleAPI thus not a original software.
4. Limited operations can be performed.
5.3 Conclusion:
BOOKS:
1. G. L. Clapper, "Automatic word recognition", IEEE Spectrum, pp. 57-59, Aug.
1971.
View Article Full Text: PDF (8868KB) Google Scholar
2. M. B. Herscher, "Real-time interactive speech technology at threshold
technology", Workshop Voice Technol. Interactive Real Time Command Control
Syst. Appl., 1977-Dec.
Google Scholar
3. J. W. Gleen, "Template estimation for word recognition", Proc. Conf. Pattern
Recog. Image Processing, pp. 514-516, 1978-June.
Google Scholar
Internet:
1. https://pypi.org/project/SpeechRecognition.
2. https://www.researchgate.net/publication/337155654_A_Study_on_Automatic_Sp
eech_Recognition.
3. https://www.ijedr.org/papers/IJEDR1404035.pdf
JAYPEE UNIVERSITY OF INFORMATION TECHNOLOGY, WAKNAGHAT
PLAGIARISM VERIFICATION REPORT
Date: ………………………….
Type of Document (Tick): PhD Thesis M.Tech Dissertation/ Report B.Tech Project Report Paper
Akshit Chaudhary and Rahul__Department: _________________
Name: ________________________ CSE/IT 171295 ,171271
Enrolment No _________
Tomar
171295@juitsolan.in and 171271@juitsolan.in
Contact No. ______________________________E-mail. ______________________________________
Dr. Pradeep Kumar
Name of the Supervisor: _______________________________________________________________
Word Counts
All Preliminary
Pages
Report Generated on Bibliography/Ima Character Counts
ges/Quotes
Submission ID Total Pages Scanned
14 Words String
File Size
Checked by
Name & Signature Librarian
……………………………………………………………………………………………………………………………………………………………………………
Please send your complete thesis/report in (PDF) with Title Page, Abstract and Chapters in (Word File)
through the supervisor at plagcheck.juit@gmail.com