IJCSP24B1264
IJCSP24B1264
org © 2024 IJCSPUB | Volume 14, Issue 2 June 2024 | ISSN: 2250-1770
AI Voice Assistant
Department of Information Technology, ABES Engineering College, Ghaziabad, India (Communicating
author: Ms. Jaya Srivastava)
Abstract: JARVIS is an advanced virtual voice assistant that leverages cutting-edge technology, including
Google Text-to-Speech (gTTS) and Python, to create a personalized assistant experience. Integrating AIML
capabilities and the industry-leading text-to-speech platform from Google, JARVIS offers male and female
voice options through the gTTS libraries, inspired by the Marvel universe. The dynamic Pyttsx Python library
plays a crucial role in optimizing the dialogue interactions between the assistant and users. JARVIS assists
users in everyday tasks such as interpreting general human speech, conducting web searches on Google, Bing,
or Yahoo, searching for videos and images, providing live weather updates, defining words, and reminding
users of scheduled events and tasks. This system is the culmination of contributions from multiple sources,
utilizing AIML's adaptability and the seamless integration with Python (pyttsx) and gTTS (Google Text-to-
Speech), resulting in a highly reusable and low-maintenance structure for JARVIS.
AI voice assistants, sometimes referred to as virtual or digital assistants, are devices that react to users by
utilising artificial intelligence (AI), natural language processing, and speech recognition technologies. The
gadget uses technology to compile, analyse, and assess user communications while providing insightful
feedback. Actual dialogues can be facilitated by artificial intelligence. Virtual assistants carry out duties for
users and can hear and interpret voice commands in natural language. These duties—which were formerly
handled by a personal assistant or secretary—include dictation, reading aloud emails or texts, setting up
appointments for end users. In addition, the AI helper is capable of obtaining directions, sending messages,
and taking phone calls. Reading news and weather reports, opening Google, You Tube, Stack Overflow, and
other websites are also helpful.. Respond to inquiries, scrape the web, perform music, etc. While this
definition focuses on the digital side of things, the term "virtual assistant" or "virtual personal assistant" is
also surprisingly often used to refer to contract workers employed by United Nations agencies who work
remotely and accomplish administrative tasks typically handled by secretaries, assistants, or executives.
Responsive advisors, another type of consumer-facing AI programming, are comparable to digital assistants.
Virtual assistants are task-oriented, while sensible adviser programmes are topic-oriented.
"Virtual assistants are usually cloud-based apps that run on devices and/or apps that are linked to the
internet." Large quantities of expertise are needed to power the platforms, machine learning, speech
recognition, and language communication procedures that underpin virtual assistants. There are gadgets
specifically designed to offer virtual support. The most fashionable AI voice assistants available are those
from Amazon, Google, and Microsoft, which offer Alexa, Google Siri, and Cortana, respectively.
II.PROBLEM STATEMENT
Now a days in fast moving world people need their work in ready to eat condition which can be solved the
AI technology .AI voice assistants frequently help users with basic tasks like adding tasks to their calendars,
retrieving information that can typically be found online, controlling and monitoring sensitive devices in
the home, sending emails, setting up alarms, getting the weather, providing your location, performing basic
math operations, checking news, turning on the music, and opening various websites like Facebook,
YouTube, Stack Overflow, etc..
III.LITERATURE REVIEW
Alotto et al. (2020) introduce an innovative approach to building modeling with artificial intelligence
and speech recognition specifically for educational purposes. Their methodology leverages AI to
create interactive and dynamic learning environments, providing a foundation for more engaging and
personalized educational experiences.
Similarly, Terzopoulos and Satratzemi (2019) explore the integration of voice assistants and artificial
intelligence in educational settings. Their research demonstrates how voice assistants can facilitate
collaborative learning and provide immediate access to information, thereby enhancing the overall
efficiency and effectiveness of the educational process.
Canbek and Mutlu (2016) focus on the role of intelligent personal assistants in education, emphasizing
their potential to offer personalized support and adapt to the unique needs of individual learners. This
adaptability is critical for fostering an inclusive learning environment where all students can thrive.
Beirl et al. (2019) examine the impact of voice assistant skills on family life. Their study highlights
the ability of voice assistants to enhance family interactions by providing convenient access to
information and facilitating communication. The authors note that voice assistants can play a
significant role in improving family dynamics and fostering more interactive and engaging
experiences.
Malodia et al. (2021) investigate the motivations behind the use of AI-enabled voice assistants in daily
life. Their research identifies key factors such as convenience, accessibility, and user satisfaction,
which drive the widespread adoption of these technologies. The study underscores the growing
reliance on voice assistants for managing everyday tasks and enhancing the quality of life.
Nasirian et al. (2017) evaluate AI-based voice assistant systems from the perspectives of interaction
and trust. Their findings emphasize the importance of building trust between users and AI systems to
ensure meaningful and effective interactions. Trust is identified as a crucial component in the adoption
and success of AI-enabled voice assistants.
Steen and Wilroth (2021) present an adaptive voice control system using AI, showcasing
advancements in voice recognition and control. Their research illustrates the potential for AI to
improve the functionality and user experience of voice assistants, making them more responsive and
intuitive.
RAJA (2020) discusses the development of AI-powered voice assistants using Python. This work
provides insights into the technical aspects of creating and implementing voice assistants, highlighting
the importance of robust programming languages and frameworks in achieving optimal performance.
Sangpal et al. (2019) also focus on the technical development of voice assistants, specifically through
the integration of AIML (Artificial Intelligence Markup Language) and Python. Their research
demonstrates how these technologies can be used to enhance the capabilities and functionalities of
voice assistants.
Vora et al. (2021) examine the development of a PC voice assistant named Jarvis. Their study
showcases the practical applications of voice assistants in personal computing, emphasizing their
versatility and ability to simplify daily tasks.
Tibola and Tarouco (2013) explore interoperability in virtual worlds, with a focus on
integrating voice assistants to enhance user experiences. Their research highlights the
importance of seamless communication between different systems and platforms, which is
essential for creating more immersive and interactive virtual environments.
IV.PROPOSED METHODOLOGY
To get the best results from the voice assistant, the voice assistant switches to voice mode and asks the
user to enter information in text or speech format. This application, called "WO-MIC," allows you to
control the programme with your phone as well. It essentially transforms any Android phone into a
wireless microphone and reduces ambient noise.
Users can contact the wizard by using this application, which is Wikipedia's search engine, and the
wizard will retrieve the information from the internet. The output is audibly shown in the console window,
with a maximum of a few lines..
Obtaining Up-to-Date News about His/Her Country, The World, Technology, Sports, Entertainment
Industry, and Much More: The user can effortlessly obtain news by simply speaking to the assistant to
open a new tab and receive updates. It can also retrieve information from websites, bring it back to the
console, and read it aloud to the user without any effort.
Weather Forecast: This tool allows users to view the predicted weather for any place. Furthermore, the
Kelvin temperature and humidity will bring back the weather.
Use the web browser's python library and operating system to open apps such as YouTube, the Google
search engine, webpages, and system programmes (like, code editor, notepad,chrome,etc.)
Close Applications: By using the command "TASKKILL/ F/im file.exe," the application functions
flawlessly. When asked to close the application, the assistant did so.
Automation: Using the keyboard Python library, the programme automates YouTube and other search
engines. All the user needs to do is provide input, and the assistant will handle the automation request.
The Voice Assistant's talk and take command functions allow it to even repeat what the user
says.WhatsApp Messages,
The application functions by requesting the recipient's name or mobile number, the message to send, and
the time at which the message should be sent. The voice assistant will notify you and send the message
as a result. Pywhatkit is a Python library that is used to accomplish this. Additionally, the pywhatkit
database file will contain the history of messages.
Checking Internet Speed: This application uses the speedtest Python Library to check the speed of the
user's connection and returns the results to the console.
Checking my location: This feature lets users see where they are right now and get directions to any
place.
When the user requests music to be played, the voice assistant complies., either from the user’s system
or through an online search, without the user having to do it.
Jarvis translator: this function converts the user-inputted text into the target language. The dictionary
contains entries for more than forty human languages.
Audiobooks: With the aid of the pypdf python library, the voice assistant will open and read the book in
your preferred language, making the application incredibly appealing.
The Assistant makes a note so that the user's crucial information is saved for later.
Sending Mails: With the help of this tool, users can email anyone whose contacts contain an email
address. After that, it uses the Hearing Assistant to notify the user that the task was completed
successfully.
V. System Architechture
The system architecture of AI-enabled voice assistants is a complex and multi-layered structure
designed to provide efficient and responsive interaction with users. This architecture typically includes
several key components: the user interface, natural language processing (NLP) module, dialogue
management, backend services, and integration with external services. Below is a detailed description
of each component.
The system architecture of AI-enabled voice assistants is designed to facilitate natural and efficient
interaction between users and technology. By integrating advanced NLP, robust dialogue
management, scalable backend services, and secure data handling, these systems provide versatile and
powerful tools for a wide range of applications in education, daily life, and beyond.
A use case diagram is a visual representation of the interactions between users (actors) and a system.
It depicts the various functionalities (use cases) that the system offers and the actors involved in these
interactions. Below, we outline a use case diagram for an AI-enabled voice assistant, highlighting the
primary actors and their interactions with the system.
for AI-enabled voice assistants provides a clear overview of the system's functionalities and
interactions between different actors. It helps in understanding how users, system administrators, and
third-party service providers interact with the voice assistant to perform various tasks and services.
This diagram is essential for designing and developing a robust and user-friendly voice assistant
system.
VII. RESULT
By utilizing artificial intelligence, we conducted tests to evaluate the efficacy of AI-enabled voice
assistants in various domains, including education, family life, and daily interactions. We gathered
diverse datasets, preprocessed the data, trained machine learning models, and implemented voice
assistant applications for real-time use. Key aspects of the data included user commands, interaction
contexts, and response accuracy metrics.
The results of our experiments underscore the substantial benefits of AI-enabled voice assistants
across different domains. In education, they enhanced learning experiences and accessibility. In
family life, they improved communication and task management. Technological advancements in
speech recognition and NLP significantly improved their performance and user satisfaction.
Integration with external services expanded their functionality, making daily life more convenient
and efficient. These findings highlight the transformative potential of AI voice assistants in
enhancing various aspects of daily life.
CONCLUSION
Jarvis is an artificial intelligence voice assistant system that combines natural language processing, neural
networks, speech recognition, and other AI approaches to create a smart system that responds intelligently to
the conditions that are presented. It can lessen the workload associated with routine human tasks and even
replace certain human labour positions, such as personal secretaries who schedule individual clients' daily
schedules. The system's ability to intelligently and thoroughly interact with other subsystems is crucial.
The phases of the system will be as follows: input phase, during which information is provided or a query is
voiced or written interpretation of text using voice data processing, data storage, and speech output generation
from improved text to the Jarvis consoleEach step's output can then be utilised to extract patterns and analyse
them for subsequent usage. This may serve as the primary foundation for machines with artificial intelligence
to learn and identify patterns on behalf of humans. Thus, it can be concluded that our system will help users
stay better organised in addition to facilitating interaction with other systems and modules, based on a review
of the literature and an examination of persisting systems.
REFERENCES
[1] Alotto, F., Scidà, I., and Osello, A. (2020). "Building modeling with artificial intelligence and speech
recognition for educational purposes." Proceedings of the EDULEARN20 Conference, Vol. 6. 7th.
[2] Beirl, D., Rogers, Y., and Yuill, N. (2019). "Utilizing voice assistant skills in family life." Computer-
Supported Collaborative Learning Conference, CSCL, Vol. 1, International Society of the Learning
Sciences, Inc. 96–103.
[3] Canbek, N. G., and Mutlu, M. E. (2016). "On the path of artificial intelligence: Learning with intelligent
personal assistants." Journal of Human Sciences, 13(1), 592–601.
[4] Malodia, S., Islam, N., Kaur, P., and Dhir, A. (2021). "Why do people use AI-enabled voice assistants?"
IEEE Transactions on Engineering Management.
[5] Nasirian, F., Ahmadian, M., and Lee, O.-K. D. (2017). "AI-based voice assistant systems: Evaluating
from the interaction and trust perspectives."
[7] Sangpal, R., Gawand, T., Vaykar, S., and Madhavi, N. (2019). "Jarvis: An interpretation of AIML with
integration of gTTS and Python." 2019 2nd International Conference on Intelligent Computing,
Instrumentation and Control Technologies (ICICICT), Vol. 1. 486–489.
[8] Steen, J., and Wilroth, M. (2021). "Adaptive voice control system using AI."
[9] Terzopoulos, G., and Satratzemi, M. (2019). "Voice assistants and artificial intelligence in education."
Proceedings of the 9th Balkan Conference on Informatics. 1–6.
[10] Tibola, L. R., and Tarouco, L. M. R. (2013). "Interoperability in virtual worlds." XVIII Congreso
Argentino de Ciencias de la Computación.
[11] Vora, J., Yadav, D., Jain, R., and Gupta, J. (2021). "Jarvis: A PC voice assistant."
[12] Nasirian et al. (2017), Malodia et al. (2021), Vora et al. (2021), Tibola and Tarouco (2013), Sangpal et
al. (2019), RAJA (2020), Beirl et al
[13] (2019), Terzopoulos and Satratzemi (2019), Alotto et al. (2020), Steen and Wilroth (2021), Canbek
and Mutlu (2016).