This document describes an artificial intelligence-based voice assistant created using Python. It summarizes that the voice assistant uses speech recognition to understand voice commands from a microphone input, and can respond verbally using text-to-speech APIs. The assistant is designed to perform common tasks like playing music, accessing websites, and answering questions through natural language processing. The document outlines the methodology used, including splitting voice requests into commands, searching a commands list to understand requests, and then performing the corresponding task. It also describes how acoustic analysis is used in automatic speech recognition to transform speech audio into text that can then be processed as commands.
This document describes an artificial intelligence-based voice assistant created using Python. It summarizes that the voice assistant uses speech recognition to understand voice commands from a microphone input, and can respond verbally using text-to-speech APIs. The assistant is designed to perform common tasks like playing music, accessing websites, and answering questions through natural language processing. The document outlines the methodology used, including splitting voice requests into commands, searching a commands list to understand requests, and then performing the corresponding task. It also describes how acoustic analysis is used in automatic speech recognition to transform speech audio into text that can then be processed as commands.
This document describes an artificial intelligence-based voice assistant created using Python. It summarizes that the voice assistant uses speech recognition to understand voice commands from a microphone input, and can respond verbally using text-to-speech APIs. The assistant is designed to perform common tasks like playing music, accessing websites, and answering questions through natural language processing. The document outlines the methodology used, including splitting voice requests into commands, searching a commands list to understand requests, and then performing the corresponding task. It also describes how acoustic analysis is used in automatic speech recognition to transform speech audio into text that can then be processed as commands.
This document describes an artificial intelligence-based voice assistant created using Python. It summarizes that the voice assistant uses speech recognition to understand voice commands from a microphone input, and can respond verbally using text-to-speech APIs. The assistant is designed to perform common tasks like playing music, accessing websites, and answering questions through natural language processing. The document outlines the methodology used, including splitting voice requests into commands, searching a commands list to understand requests, and then performing the corresponding task. It also describes how acoustic analysis is used in automatic speech recognition to transform speech audio into text that can then be processed as commands.
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:05/Issue:05/May-2023 Impact Factor- 7.868 www.irjmets.com ARTIFICIAL INTELLIGENCE -BASED VOICE ASSISTANT Gowhar Ahmad Dar*1, Jeby Tom Kurian*2, Abin K Shaji*3, Chrisil T Jose*4, Dr. Anju ratap*5 *1,2,3,4Students, Dr. A.P.J. Abdul Kalam Technical University Kerala, Computer Science & Engineering, Saintgits College of Engineering (Autonomous), Kottayam, Kerala, India. *5Head of Department, Department of Computer Science & Engineering, Saintgits College of Engineering (Autonomous), Kottayam, Kerala, India. DOI : https://www.doi.org/10.56726/IRJMETS40794 ABSTRACT Artificial intelligence era is starting to be actively utilized in human life, making it simpler to visualize. Independent gadgets are clever in their methods of speaking with each other. One of the maximum suitable sorts of synthetic intelligence is the capacity to realize human herbal language. New thoughts in this text may also result in new methods of operating with the human system, wherein the system will learn how to understand, adapt, and have interaction with it. Thus, we want to broaden a private assistant having splendid powers of deduction and the capacity to have interaction with the environment simply with the aid of using one of the materialistic styles of human interaction, i.e., with the aid of using voice. Desktop-primarily based totally voice assistants are applications which can realize human voices and might reply through an incorporated voice system. To convert textual content to audio, we are able to use APIs. We use the synthetic intelligence era for this project. Use Python as a programming language as well, as it has a big library. This software program makes use of a microphone as an enter tool to get hold of voice requests from the consumer and a speaker as an output tool to present the output voice. Keywords: Python, Assistant, Incorporated voice system, Microphone, Speaker, APIs. I. INTRODUCTION This project is based on web application development and provides a personal assistant using voice recognition or text mode control. This program includes features and services: call services, text message transformation, mail exchange, alarm, event handler, location services, music player service, weather control, Google search engine, Wikipedia search engine, chat robot, camera, Bing translator, Bluetooth headset support, help menu and Windows Azure cloud computing. Virtual assistants are very useful for the elderly, people with disabilities or special cases, and young children who do not know how to operate machines or smart devices. They ensure that their interactions with machines are no longer difficult and also enable them to multitask. Upcoming technology trends like virtual reality, augmented reality, voice interaction, IOT etc. are changing the way people engage with the world and transforming digital experiences. Voice control is one of the major advancements in human-machine interaction enabled by advances in artificial intelligence. Nowadays, we are able to train our machines to perform tasks on their own or to think like humans using technologies such as artificial intelligence and machine learning. Recently, the great appearance of voice assistants like Apple Siri, Google Assistant, Microsoft Cortana and Amazon Alexa has been noted due to the heavy use of smartphones. Voice assistants use technologies such as voice recognition, speech synthesis, and natural language processing (NLP) to provide a variety of services that help users perform tasks with their devices simply by giving voice commands. With the help of a voice assistant, there will be no need to type commands again and again to perform a specific task. Voice search prevailed over text search. Mobile web searches have only just overtaken desktop searches, and analysts are already predicting that by 2022, 50% of searches will be conducted by voice. Virtual assistants are proving to be smarter than ever. Let your intelligent assistant do the email work for you. Discover intent, select important information, automate processes and deliver personalized responses. This project was started on the premise that there is enough publicly available data and information on the web that can be used to build a virtual assistant that has access to intelligent decision making for common user activities.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[8177] e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:05/Issue:05/May-2023 Impact Factor- 7.868 www.irjmets.com II. METHODOLOGY Methodology: Voice assistants are all written in programming languages, which listens the verbal commands and respond according to the user's requests. In this project we have used Python Programming language to build the Al- based Voice assistant. A user can say, "Play me a Song" or "Open facebook.com", the voice assistant will respond with the results by playing that particular song or by opening Facebook website. The Voice assistant waits for a pause to know that users have finished their request, then the voice assistant sends users request to its database to search for the request. The request asked by the user gets split into separate commands, so that our voice assistant can able to understand Once within the commands list, our request is searched and compared with the other requests The commands list then sends these commands back to the Voice assistant. Once the voice assistant receives those commands, then it knows what to do next. The voice assistant would even ask a question if the request is not clear enough to process it, in other words, to make sure it understands what we would like to receive. If it thinks, it understands enough to process it, the voice assistant will perform the task which the user has asked for. Working of ASR: As shown in Figure 1. Automatic Speech Recognition which is termed as ASR is the main principle behind the working of Al-based Voice Assistant [4] ASR systems, at first it records the speech, then the wavefile has been created by the device which consists of the words it hears, later the wavefile will be cleaned so that the background noise would get deleted and the volume will be normalized, then it will break down into elements and it will be analysed in sequences, then the ASR software examines these sequences and it implements statistical probability to find out the entire words and then it will get processed into text content [5] The better method to recognise elements is Element Recognition as it provides better results than the method of word decoding.
Figure 1: Process of ASR.
It does not matter what kind of speech recognition software we may use, because all the work happens in its ASR During a nutshell, at first the method starts with the device gathering audio with the source, where source is microphone, then the Recorded speech waveforms will be sent to acoustic analysis, which will be performed on three different levels, as shown in Figure 2. Acoustic Analysis Acoustic Modelling: In this process, it represents that the elements were pronounced or not and what are the words which can complete these elements Pronunciation Modelling: That analyses the way, where how these elements are pronounced, it will check whether there is any accent or other peculiarities Language Modelling: This is often aimed toward finding contextual probabilities counting on what elements were captured. All the data which were recorded get processed by Artificial Intelligence without any human interaction, then the speech waveforms data is transmitted to the decoder where it finally transforms into text for further use like command
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[8178] e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:05/Issue:05/May-2023 Impact Factor- 7.868 www.irjmets.com
Figure 2: Acoustic Analysis.
III. MODELING AND ANALYSIS The project will give a fair knowledge about the intelligent assistant which is capable of understanding the commands given by the user. Our assistant can easily. understand the commands given by the user through vocal media and responds as required. Our assistant performs the most frequently asked requests from the user and makes their task easier. Our voice assistant listens to the command given by the user through the microphone. After listening it will say "done listening" and displays what the user said and acts accordingly. In our project we have installed gTTS engine package to make the voice assistant speak like a normal human being. We have defined a function called 'voice assistant speak', as explained in (1) The gTTS will analyze the command given by the user through microphone and searches in the browser the required response and convert that response into text. tts=gTTS(text audio_string, lang='en') (1) gTTS is basically used to convert the audio string into text. This audio string is nothing but the response which the voice assistant is supposed to give the user. The language of the text is chosen to be English, the code for English is 'en'. We save this entire function into 'tts'. We are saving this text, that is the audio file with the mp3' extension. Each audio file is given a random number from 1 to 20000000. The random number can be generated using the command random.randint(). This whole .mp3 extension file is saved under the name 'audio file'. Finally to save this audio file we have used the command as mentioned in (2). tts.save(audio file) (2) This command (2) saves the audio file in the system. (Ex-'audio24854.mp3'). Text-To-Speech: Text-to-Speech (TTS) refers to computers' ability to read text aloud. A TTS Engine converts written text into a phonemic representation, which is then converted into waveforms that can be output as sound. Third-party publishers offer TTS engines with various languages, dialects, and specialized vocabularies. Speech Recognition: The system converts speech input to text using Google's online speech recognition system. The voice input Users can obtain texts from the special corpora organized on the computer network server at the information center, which are temporarily stored in the system before being sent to Google cloud for speech recognition. After that, the equivalent text is received and fed into the central processor. API Calls: API is an abbreviation for Application Programming Interface. An API is a software interface that allows two applications to communicate with one another. In other words, an API is the messenger that sends your request to the provider you're requesting from and then returns the response to you.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[8179] e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:05/Issue:05/May-2023 Impact Factor- 7.868 www.irjmets.com Context Extraction: Context extraction (CE) is the process of extracting structured information from unstructured and/or semi- structured machine-readable documents automatically. The majority of the time, this activity involves using natural language processing to process human language texts (NLP). Recent developments in multimedia document processing, such as automatic annotation and content extraction from images/audio/video, could be viewed as context extraction TEST RESULTS. IV. RESULTS AND DISCUSSION The required packages of Python programming language has been installed and the code was implemented using PyCharm Integrated development environment (IDE) and the python code we have developed runs in both Python 2.7 and Python 3.x, and below are the few outputs which we have received in our AI-based voice assistant. 1. Google Search Output As shown in below Figure 3. When we ask the voice assistant to search 'Akshay Kumar', it receives the request and performs the action by searching google.
Figure 3: Output screen of performing Google Search
2. Weather As shown in Figure 4. when we ask weather to the voice assistant, it receives the request and responds back by It prints the weather of that location at that time.
Figure 4: Output screen of displaying Weather Forecast
3. Generate Pdf As shown in Figure 5. If the user wants to do two things simultaneously, then like pdf generation and other stuff , The system will assist in adding user command data to the pdf and will generate it. So that information
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[8180] e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:05/Issue:05/May-2023 Impact Factor- 7.868 www.irjmets.com user wants to store in PDF format, the system will convert the user's speech into PDF and stored at specific location.
Figure 5: Output Screen of Generating Pdf
V. CONCLUSION The project is very useful and has great potential for use in various industries. Although the programmer primarily focuses on how to use a personal assistant on websites, Voice and the concept of voice recognition can be used in various industries, as in many. It will be more convenient, save a lot of time, and be especially useful for those who have difficulty working with manual operations. There may be more applications or products developed using voice technology in the future of the program. Controlling and, in a certain sense, changing the forms of work that are quite different from the traditional form A programmer that uses voice is useful for those who prefer voice control and for those who have difficulties or disabilities with manual operations. The primary objective of the programmer is to provide voice services, and it allows more people to enjoy this program. In addition to the program, we as developers learned a lot from the project. It's completely different from what we've experienced before in the working model, the volume of tasks, and the challenges we've encountered. In conclusion, we have learned a lot and improved a lot thanks to the project. development and gained development experience as well as programming skills; for long-term and demanding development, it is important to work as a team. VI. REFERENCES [1] Agrawal, Nivedita Singh, Gaurav Kumar, Dr. Diwakar Yagyasen, Mr. Surya Vikram Singh. "Voice Assistant Using Python" An International Open Access-revied, Refereed Journal.Unique Paper ID: 152099, Publication Volume & Issue: Volume 8, Issue 2, Page(s): 419-423. [2] George Terzopoulos, Maya Satratzemi “Voice Assistants and Smart Speakers in Everyday Life and In Education”, Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece. [3] Deepak Shende. Ria Umabiya, Monika Raghorte, Aishwarya Bhisikar. Anup Bhange. "Al Based Voice Assistant Using Python", International Journal of Emerging Technologies and Innovative Research (www.jetir.org), ISSN 2349-5162, Vol.6, Issue 2, page no.506-509, February-2019. [4] Saadman Shahid Chowdury, Atiar Talukdar, Ashik Mahmud, Tanzilur Rahman, "Domain specific Intelligent personal assistant with bilingual voice command processing," IEEE 2018. [5] Tulshan, Amrita & Dhage, Sudhir. (2019). “Survey on Virtual Assistant: Google Assistant, Siri, Cortana, Alexa”, 4th International Symposium SIRS 2018, Bangalore, India, September 19–22, 2018, Revised Selected Papers. 10.1007/978-981-13- 5758-9_17. [6] Polyakov EV, Mazhanov MS, AY Voskov, LS Kachalova MV, Polyakov SV "Investigation and development of the intelligent voice assistant for the IOT using machine leaming." Moscow workshop on electronic www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [8181] e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science ( Peer-Reviewed, Open Access, Fully Refereed International Journal ) Volume:05/Issue:05/May-2023 Impact Factor- 7.868 www.irjmets.com technologies, 2018. [7] Dr. Kshama V. Kulhalli, Dr.Kotrappa Sirbi, Mr. Abhijit J. Patankar, "Personal Assistant with Voice Recognition Intelligence", International Journal of Engineering Research and Technology. ISSN 0974- 3154 Volume 10, Number 1 (2017). [8] K. Noda, H. Arie, Y. Suga, T. Ogata, Multimodal integration learning of robot behavior using deep neural networks, Elsevier: Robotics and Autonomous Systems, 2014. [9] Thakur, N., Hiwrale, A., Selote, S., Shinde, A. and Mahakalkar, N., Artificially Intelligent Chatbot. [10] Huang, J., Zhou, M. and Yang, D., 2007, January. Extracting Chatbot Knowledge from Online Discussion Forums. In IJCAI(Vol. 7, pp. 423-428). [11] JKhawir Mahmood, Tausfer Rana, Abdur Rehman Raza, "Singular adaptive multi role intelligent personal assistant (SAM-IPA) for human computer interaction," International conference on open source system and technologies, 2018. [12] Piyush Vashishta, Juginder Pal Singh, Pranav Jain, Jitendra Kumar, "Raspberry PI based voice-operated personal assistant," International Conference on Electronics And Communication and Aerospace Technology, ICECA, 2019. [13] Veton Kepuska and Gamal Bohota. "Next generation of virtual assistant (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home)." IEEE conference, 2018
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science