A Systematic Review of Voice Assistant Usability: An ISO 9241-11 Approach

SN Computer Science (2022) 3:267
https://doi.org/10.1007/s42979-022-01172-3
ORIGINAL RESEARCH
A Systematic Review of Voice Assistant Usability: An ISO 9241–11

Approach
Faruk Lawal Ibrahim Dutsinma1 · Debajyoti Pal1 · Suree Funilkul2 · Jonathan H. Chan1
Received: 6 January 2022 / Accepted: 20 April 2022 / Published online: 3 May 2022
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2022
Abstract
Voice assistants (VA) are an emerging technology that have become an essential tool of the twenty-first century. The VA ease
of access and use has resulted in high usability curiosity in voice assistants. Usability is an essential aspect of any emerging
technology, with every technology having a standardized usability measure. Despite the high acceptance rate on the use of
VA, to the best of our knowledge, not many studies were carried out on voice assistants’ usability. We reviewed studies that
used voice assistants for various tasks in this context. Our study highlighted the usability measures currently used for voice
assistants. Moreover, our study also highlighted the independent variables used and their context of use. We employed the
ISO 9241-11 framework as the measuring tool in our study. We highlighted voice assistant’s usability measures currently
used; both within the ISO 9241-11 framework, as well as outside of it to provide a comprehensive view. A range of diverse
independent variables are identified that were used to measure usability. We also specified that the independent variables
still not used to measure some usability experience. We currently concluded what was carried out on voice assistant usabil-
ity measurement and what research gaps were present. We also examined if the ISO 9241-11 framework can be used as a
standard measurement tool for voice assistants.
Keywords Voice assistants · Systematic literature review · Usability · User experience · ISO 9241-11 framework
Introduction communication used by these devices, rendering the graphic

user interface (GUI) inapplicable or less meaningful [2].
Voice assistants (VAs) which are also called intelligent People use VA technology in different aspects of their lives,
personal assistants are computer programs capable of such as for simple tasks like getting the weather report [3]
understanding and responding to users using synthetic or managing emails [4]. In addition, the VA can perform
voices. Voice assistants have been integrated into different complex tasks like client representative tasks [5] and con-
technological devices, including smartphones and smart trollers in autonomous vehicles [6]. In other words, VA’s
speakers [1]. The voice modality is the central mode of can revolutionize the way people interact with computing
systems [7]. Currently, there is a massive global adoption
of voice assistants. A report in [8] indicates that 4.2 billion
* Debajyoti Pal VA’s were adopted and used in 2020 alone, with a projected
debajyoti.pal@mail.kmutt.ac.th; debajyoti.pal@gmail.com
increase to 8.4 billion by 2024. The popularity of VA’s has
Faruk Lawal Ibrahim Dutsinma led to a greater research attention to its usability and user
lawal.faruk@mail.kmutt.ac.th
experience aspect.
Suree Funilkul Usability is a critical factor in the adoption of voice assis-
suree@sit.kmutt.ac.th
tants [9]. A study by Zwakman et al. [10] highlighted the
Jonathan H. Chan importance of usability in voice assistants [9]. An additional
jonathan@sit.kmutt.ac.th
study by Coronado et al. [11] reiterated the importance of
1
Innovative Cognitive Computing (IC2) Research Center, usability in human–computer interaction tools. Numerous
King Mongkut’s University of Technology Thonburi, studies have been carried out on the usability heuristics used
Bangkok, Thailand in a VA, each study adopting a unique approach. A study by
2
School of Information Technology, King Mongkut’s Maguire [12] used the Nielsen and Molich versions of Voice
University of Technology Thonburi, Bangkok, Thailand
SN Computer Science
Vol.:(0123456789)
267 Page 2 of 23 SN Computer Science (2022) 3:267
User Interface (VUI), and the heuristic Voice User Interface studies dealing with the usability aspects of VA’s. The fol-
(VUI), to evaluate the ease of use of the VA’s. The study lowing are the contributions of this literature review to the
affirmed both the two heuristics were appropriate. However, Human--Computer Interaction (HCI) community:
the study noted that one was less problematic to use than the
other [12]. A further study tested VUI heuristics to measure 1. Our work highlights the studies currently carried out on
VA efficacy [13]. However, a critical factor that prevents VA usability. This includes the independent and depend-
the VA from adopting the heuristic currently available is the ent variables currently used.
absence of a graphical user interface (GUI). Despite numer- 2. Our study highlights the factors that affect the voice
ous studies on heuristics, the level of satisfaction is still low assistants' acceptance and impact the user’s total expe-
[14]. Furthermore, heuristics cannot be used as a standard- rience.
ized approach because they are approximate strategies or 3. We identify and explain some attributes unique to only
empirical rules for decision-making and problem-solving voice assistants, such as machine voice.
that do not ensure a correct solution. According to a study by 4. We also highlight the evaluation techniques used in pre-
Murad [16], the absence of standardized usability guidelines vious studies to measure usability.
when developing VA interface presents a challenge in the 5. Finally, our study tries to compare the existing usability
development of an effective VA [15]. Another report from studies with the ISO 9241-11 framework. The decentral-
Budi & Leipheimer [17] also suggests that the usability of ized approach of the VA usability measurement makes
the VA’s requires improvements and standardization [16]. To it vague to understand if the ISO 9241-11 framework is
create a standard tool a globally recognized and well-known being adhered to whilst developing the usability metrics.
organization is critical in the process because it eliminates
bias and promotes neutrality [17]. The International Organi- We hope that our input will highlight the integration of
zation for Standardization (ISO) 9241-11 framework is one the current existing VA usability measures with the ISO
of the standard usability frameworks widely used for meas- 9241-11 framework. This will also verify whether the ISO
uring technology acceptance. 9241-11 framework can serve as a standard measure of usa-
According to the ISO 9241–11 framework, usability is bility in voice assistants. In conclusion, our study tries to
defined as “the degree to which a program may be utilized answer the following four research questions:
to achieve measurable objectives with effectiveness, effi-
ciency, and satisfaction in a specific context of usage” [18]. • RQ1: Can the ISO 9241–11 framework be used to meas-
ISO 9241-11 provides a framework for understanding and ure the usability of the VA’s?
applying the concept of usability in an interactive system • RQ2: What are the independent variables used when deal-
and environment [19]. The main advantage of using the ISO ing with the usability of VA’s?
standard is that industries and developers do not need to • RQ3: What current measures serve as the dependent vari-
build different design measurement tools. This standard is ables when evaluating the usability of VA’s?
intended to create compatibility with new and existing tech- • RQ4: What is the relationship between the independent
nologies, and also create trust [20]. Currently, the system and dependent variables?
developers do not have any standardized tool created spe-
cifically for the measurement of VA usability, consequently, The remaining work is structured as follows. The second
the measures are decentralized, causing confusion among section presents the related work. This highlights what previ-
developers. The lack of in-depth assessment of the current ous literature review studies had been carried out on voice
heuristics used in the VA design affects the trust and adapt- agents’ usability; furthermore, the section also highlights
ability of their users [15]. Other emerging technologies such the emergent technology that employed the ISO 9241-11
as virtual reality [21] and game design [22] have understood framework as a usability measuring tool. This is followed by
the importance of creating an acceptable standardized meas- the methodology section, which presents the inclusion and
urement tool when designing new interfaces. Therefore, VA exclusion criteria used together with the review protocol.
technology could also benefit significantly from the same Furthermore, the query created for the database search is
concept. As evident from the above discussion, there is little presented, and the database to be used is also selected. The
to no focus on VA standardization. fourth section presents the result and analysis. In this phase,
Our study presents a systematic literature review compris- the article used for this study is listed. Also, the research
ing works carried out on the usability of voice assistants. In questions are answered. The fifth section contains discussion
addition, we use the ISO 9241-11 framework as a standard- on the result analysis. This includes a more detailed explana-
ized measurement tool to analyze the findings from the stud- tion of the relationships between independent and dependent
ies we collected. We chose the ACM and IEEE databases for variables. Our insights and observations are included in this
the selection of our articles because both contain a variety of section as well.
SN Computer Science
SN Computer Science (2022) 3:267 Page 3 of 23 267
Literature Review developing a government portal. In addition, the ISO

9241-11 framework was also used to evaluate other avail-
Previous Systematic Reviews able methods and tools. For instance, a study by Maria
et al. [44] used the framework to evaluate existing tools
There have been a number of systematic literature reviews used in the measurement of usability of software products
concerning VA’s over the years. Table 1 presents the infor- and artifacts on the web. The study compared existing
mation for a few of the relevant works. tools with the ISO 9241-11 measures for efficiency, effec-
As highlighted in Table 1, multiple systematic literature tiveness and satisfaction [43]. ISO 9241–11 framework
reviews have been carried out on VA's usability over the has also been employed as a method of standardization
years. However, each study has a specific limitation and tool in the geographic field [44], game therapy in dementia
gap for improvement. For instance, some studies focus [45], and logistics [46]. Despite the ISO 9241-11 usability
on the usability of voice assistants used only in specified framework being utilized in different aspects of old and
fields such as education [25] and health [36]. Other stud- emergent technologies, it has not been used with a VA in
ies focus on the usability of voice assistants concerning the past.
only specific age groups, such as older adults [28]. Like-
wise, although an in-depth analysis of the usability of the
VA’s is carried out involving every usability measure in
[32], this study does not use the ISO 9241 framework as Methods
a measuring standard. On the other hand, another study
in [33] although uses the ISO 9241 framework as a meas- We performed a systematic literature review is this study
uring standard, however, the usage context was chatbots using the guidelines established by Barbara [47]. These
focusing primarily on text-based communication instead guidelines have been widely used in other systematic review
of voice. Overall, the available literature reviews on VA’s studies as a result of their rigor and inclusiveness [48]. In
usability listed in Table 1 supports the view that very few addition, we have added a new quality assessment process
of the current literature review studies on VA’s use the to our guidelines. The quality assessment is a list of ques-
ISO 9241-11 framework as an in-depth tool for measur- tions that we use to independently measure each study to
ing usability. ensure its relevance for our review. Our quality evaluation
checklists are derived from existing studies [49, 50]. The
complete guidelines used in this section comprises of four
The ISO 9242‑11 Usability Framework different stages:
The ISO 9241-11 is a usability framework used to under- 1. Inclusion and exclusion criteria
stand usability in situations where interactive systems 2. Search query
are used and employed, which includes framework envi- 3. Database and article selection.
ronments, products, and services [39]. Nigel et al. [40] 4. Quality assessment.
conducted a study to revise the ISO 9241-11 framework
standard, which reiterates the importance of the frame-
work within the concept of usability. A number of studies Inclusion and Exclusion Criteria
have been conducted on various technologies using the
ISO 9241–11 framework as a tool to measure their usabil- The inclusion and exclusion criteria used in our study are
ity. This shows the diversified approach when using the developed for completeness and avoidance of bias. The cri-
framework. For instance, a study by Karima et al. (2016) teria we used for our study are:
proposed the use of ISO 9241-11 framework to measure
the usability of mobile applications running on multi- a. Studies that focus on VA, with voice being the primary
ple operating systems by developers, in which the study modality. In scenarios where the text or graphical user
identified display resolution and memory capacity as fac- interfaces are involved, they should not be the primary
tors that affect the usability of using mobile applications focus.
[41]. Another study used the ISO 9241-11 framework to b. Studies are only in the English language to avoid mis-
identify usability factors when developing e-government takes during translation from another language
systems [42]. This study focused on the general aspect c. The studies include at least one user and one voice assis-
of e-Government system development and concluded the tant to ensure that the focus is on usability, not system
framework could be used as a usability guideline when performance.
d. Study has a comprehensive conclusion.
SN Computer Science
Table 1 Current literature reviews
267
# Article name Summary Limitations Usability focus
[23] Smart Home Voice Assistants: A Literature Survey The study explores the potential use vulnerabilities Privacy and vulnerability are not the primary focus Personal Smart Home use
of User Privacy and Security Vulnerabilities encountered while using the voice assistant. The in usability
Page 4 of 23
studies looked at the vulnerabilities, associated

attack vectors, and possible mitigation measures
that users can take to protect themselves during
SN Computer Science
the use of voice assistant
[24] Intelligent personal assistants: A systematic litera- The natural language interfaces allow the human– The study did not conduct a thorough review of General use
ture review computer interaction by the translation of the what was done with respect to the usability of the
human intention in the controls of the devices, the voice assistant
analysis of the speech or the gestures of the user.
The article looked at the major trends, critical
areas and challenges of an intelligent personal
assistant. The study also proposed a taxonomy for
IPA classification. The method used the popula-
tion, intervention, comparison, outcome, and
context (PICOC) criteria
[25] Virtual Assistants for Learning: A Systematic The motivation, commitment and decreasing inter- The Study focused on voice assistants used only Education
Literature Review est of students in the learning process has always within an educational environment that motivates
existed, contributing to increased failures and users
dropouts. This can be attributed caused due to
the difficulties with time management. The grow-
ing number of students in higher education makes
it impossible to provide individual tutoring and
support to each student. This paper systematically
examines the use of virtual assistants in tertiary
education
It focuses on the technology which fuels them, their
characteristics and their impact in the learning
process
[26] Voice-Based Conversational Agents for the Preven- Chronic and mental diseases are increasingly The study only focused on voice assistants used in Health
tion and Management of Chronic and Mental prevalent throughout the world. As devices in the health environment alone
Health Conditions: Systematic Literature Review our everyday lives offer more and more voice-
based self-service, voice assistant can support the
prevention and management of these condi-
tions. This study highlights the current methods
used in the evaluation of health interventions for
the prevention and management of chronic and
mental health conditions delivered through voice
assistant
Table 1 (continued)
[27] Tourists’ Attitudes toward the Use of Artificially This study examines tourist attitudes towards the The study was not on usability, to be specific, Tourism
Intelligent (AI) Devices in Tourism Service use of voice assistants in relatively more utilitar- but on adopting voice assistants in the tourism
Delivery: Moderating Role of Service Value ian or hedonic (air and hotel) tourism services. environment
Seeking The results of the study suggest that tourism
acceptance of VA is influenced by social influ-
ence, hedonistic motivation, anthropomorphism,
expectation of performance and exertion, and
emotions towards artificially intelligent devices.

These results suggest that while the use of voice
Assistants in the provision of functional services
is acceptable, the use of AI devices in the delivery
of hedonic services could backfire
[28] Exploring How Older Adults Use a Smart Speaker– Smart speaker-based voice assistants promise sup- The Study group focused on was only older adults General Use
Based Voice Assistant in Their First Interactions: port for the aging population, with the benefits
Qualitative Study Exploring How Older Adults of hands-free and eye-free interaction to process
Use a Smart Speaker–Based Voice Assistant in applications. This study explores how older adults
Their First Interactions: Qualitative Study experience and react to a voice assistant when
they first interact with that person. The study
discusses design implications that can positively
influence older adults using voice assistant,
including helping better understand how a voice
assistant work and tailored to the needs of older
adults
[29] A Meta-Analytical Review of Empirical Mobile This document provides a usability assessment The impacts of User characteristics and environ- Mobile computing
Usability Studies framework tailored to the context of a mobile IT ment on usability were not explored in the study
environment. The study conducted a qualitative
meta-analysis of more than 100 empirically based
mobile usability studies. This study included the
contextual factors studied, the dimensions of core
and peripheral usage measured. Furthermore,
open and unstructured tasks are under-utilized,
and the effects of interaction between interactivity
and complexity warrant further study
Page 5 of 23 267
SN Computer Science
Table 1 (continued)
267
[30] Evaluation of COVID-19 Information Provided by Digital voice assistants are widely used to search The study focused only on voice assistants used in Health
Digital Voice Assistants for health information during COVID-19. With Covid-19 related issues
Page 6 of 23
the rapidly changing nature of COVID-19 infor-

mation, there is a need to assess the COVID-19
information provided by voice assistants to meet
SN Computer Science
consumer needs and prevent disinformation. The
goal of this study is to evaluate the COVID-19
information provided by voice assistants in terms
of relevance, accuracy, usability and reliability.
The study found that information about this
pandemic is evolving rapidly and that users must
use good judgment when obtaining COVID-19
information from voice assistants
[31] The human side of human-Chatbot interaction: Over the last ten years there has been a growing The study focused only on chatbot with textual General Use
A systematic literature review of ten years of interest around text-based chatbot, software appli- modality. Moreover, chatbot use a Graphic User
research on text-based chatbot cations interacting with humans using natural Interface that is not present in voice assistants
written language. However, despite the enthusias-
tic market predictions, ‘conversing’ with this kind
of agents seems to raise issues that go beyond
their current technological limitations, directly
involving the human side of the interaction. This
study suggests a number of research opportunities
that could be explored over the next few years
[32] Voice in Human–Agent Interaction: A Survey Social robots, conversational agents, voice assis- The study did not use the ISO 9241–11 framework General Use
tants and other embodied AIs are increasingly as a reference in their measurement scale
a characteristic of daily life. The connection
between these different types of intelligent agents
is their ability to interact with people by voice.
The voice becomes an essential mode of embodi-
ment, communication and interaction between
IT operators and end users. This study presents
a meta-synthesis of the voice of agents in the
conception and experience of agents from a man-
centered point of view: voice assistant
[33] Usability of Chabot’s: A Systematic Mapping Study The use of chatbot has increased considerably in The study focused only on chatbot with textual General Use
recent years. As a result, it is essential to integrate modality. Moreover, chatbot use a Graphic User
conviviality into their development. For this rea- Interface that is not present in voice assistants
son, it is essential to integrate conviviality in their
development. The study identifies the state of the
art in the conviviality of chatbot and the applied
techniques of human–computer interaction, to
analyze how to assess the conviviality of chatbot
Table 1 (continued)
[34] A Literature Review On Chatbot In Healthcare The study highlighted Chabot used in the The study deals with chatbot with textual modality. health
Domain healthcare environment. Also, it compares the Also, the study deals with chatbot used only in a
techniques such as NLU, NLG, and ML used in healthy environment
chatbot development
[35] Review of Chatbot Design Techniques The study reviewed the techniques and factors The study focuses on chatbot with textual modality, Commerce
considered when designing a chatbot.Also it which is different from a voice assistant
highlighted how chatbot worked and what are the
type of approaches that are available for chatbot

development
[36] A Systematic Literature Review of Medical Chatbot The study examined the literature on how people The study focuses on chatbot with textual modality, Health
Research from a Behavior Change Perspective feel about using a medical chatbot in medical which has different factors than voice assistants
communication services. Moreover, The study
recommended five design-orientation and high-
lighted the behavioral aspects such as acceptance,
usage, and effectiveness when using chatbot
[37] A review of chatbot in education: Practical steps The study focused on Chatbot applied within an The study focuses on chatbot, with textual modal- Education
forward educational environment; it highlighted how ity, with different factors than voice assistants
Chatbot are currently being used in a broader
educational environment. Moreover, the study
also recommended how Chatbot can be applied to
enhance students learning experience
[38] Human-like communication in conversational The study identified the voice assistant human-like An in-depth analysis of user and conversation assis- Management
agents: a literature review and research agenda behaviors that have the most effect on relational tants attributes was not carried out. Moreover,
outcomes during communication the study only focused on voice assistants used in
management alone
Page 7 of 23 267
SN Computer Science
e. Released between 2000 and 2021, because during this search query returned 340 results from the ACM database
period the vocal assistants started to gain notable popularity and 280 results from the IEEE database. 720 items in both
databases were checked for duplication and 165 documents
The exclusion criteria are: (23%) were found to be duplicated and hence removed.
Additionally, more items were filtered by title and abstract.
a. Studies with poor research design, where the study's We utilized keyword match to search the title; however, the
purpose is not clear are excluded. abstract was read to identify the eligibility criteria. In addi-
b. White papers, posters, and academic thesis are excluded. tion, 399 documents (72%) were removed because they did
not meet the eligibility criteria. Finally, 121 documents were
Search Query removed that were not consistent with the research objec-
tives of our study. At the end of the screening process 29
We created the search query for our study using keywords articles (19%) were finally included in this literature review.
arranged to search the relevant databases. We went through pre-
vious studies to find the most relevant search keyword to find what Quality Assessment
is commonly used in usability studies. After numerous debates
among the researchers and seeking two HCI expert's opinion, we The selected items presented in Table 2 are used for assess-
chose the following set of keywords: usability, user experience, ing the quality of the selected articles. The process was
voice assistants, personal assistants, conversational agents, Google deployed to ensure the reported contents fit into our research.
Assistant, Alexa, and Siri. We connected the keywords with logi- The sections collected from articles such as the methodol-
cal operators (AND and OR) to yield accurate results. The final ogy used, analysis done, and the context of use within each
search string used was (“usability” OR “user experience “) AND article were vital to our study. Each question is a three-point
(“voice assistants” OR “personal assistants” OR “conversational scale: “Yes” is scored as 1 point, which means the question
Agents” OR “Google Assistant” OR” Alexa” OR “Siri”). The is fully answerable. “Partial” is scored as 0.5, which means
search was limited to the abstract and title of the study. the question is vaguely answered, and “NO “is scored as
0, which means it is not answered at all. All the 29 sets of
Database and Article Selection finally included articles passed the quality assessment phase.
Figure 1 highlights the graphic presentation of the selection

and filtering process. The figure is adapted from the Prisma Result and Analysis
flow diagram [51]. As earlier stated, two databases are used
as the sources for our article selection: the Association for List of Articles
Computing Machinery (ACM) and the Institute of Electri-
cal and Electronics Engineers (IEEE). Both databases we This section lists and discusses the articles collected in
used in our study contain the most advanced studies on VA the previous stage. Table 3 presents the list of all the
and are highly recognized among the HCI community. The
Fig. 1 Article selection process
SN Computer Science
Table 2 Quality assessment Checklist Definition

checklist
C1 Are the study aims and objectives clearly stated
C2 Is the article well designed to achieve these aims?
C3 Are the independent variable in the study clearly defined?
C4 Are the dependent variable in the study clearly defined?
C5 Is the study discipline stated clearly?
C6 Are the data collection methods clearly stated
C7 Does the study explain the reliability and validity of the measures?
C8 Are the analysis techniques described adequately?
C9 Are the users/participants’ numbers stated clearly?
C10 Do the results add to the literature?
compiled articles. Moreover, we identified the usability households have an intelligent smart speaker, and pro-
focus of each study. jected to reach 75% by 2025 [84]. Use of humanoids is
also popular because usability measures such as anthro-
pomorphism are essential for voice assistant usability
Voice Assistant Usability Timeline [85]. Furthermore, Fig. 3 shows that only a few studies
were done on car interface voice assistants. Car interfaces
We grouped the collected research into three categories, are vocal assistants that act as intermediaries between
each representing a range of time frames (Fig. 2). The the driver and the car. The VA car interface allows driv-
categorization is based on voice assistant period break- ers to access car information and also be able to perform
throughs. The first category is from 2000 to 2006, which the task without losing focus on driving. The fourth type
was the year of social media and camera phones, also of software interface refers to a voice assistant software
known as the year of the Y2K bug in telecommunications. embedded inside smartphones or computers. The studies
During these years, conversational agents started to get we have collected have used either the commercialized
noticed with the introduction of the inventions such as the form of the software interface, such as Alexa and Siri,
Honda’s Advanced Step in Innovative Mobility (ASIMO) while others have developed new voice interfaces that are
humanoid robot [80]. The second category ranges from easily accessible to users due to the adoption of smart-
2007 to 2014. During these years technological advance- phones and computers assistants using programming codes
ments got users more exposed to voice assistants through and skills. Nevertheless, both are in the forms of different
embedding them into smartphones and computers. For software agents.
instance, Apple first introduced SIRI in 2011 [81], and
Microsoft introduced Cortana in 2014. The last category
ranges from 2015 to 2021. This was when the massive Component of ISO 9241‑11 Framework
adoption of voice assistants took place, making it an all-
time high. The ISO 9241-11 framework highlights two components, the
Based on the year of publication of our selected context of use and usability measure [18]. We concentrate on
articles, Fig. 2 clearly shows that the study on VA’s both components to highlight any correlations between usa-
has expanded significantly over the last six years bility metrics and the context of use in the selected articles.
(2014–2021). This can be attributed to the invention of a The context of use consists of the different independent vari-
smart speaker and phone with built-in voice agents [82]. ables along with the techniques used for analyzing them.
Another reason for VA popularity is the COVID -19 out- Likewise, the usability measure represents the dependent
break that has given a fresh impetus towards touchless variables, i.e., the effect that the independent variables have
interaction technologies like voice [83]. on the overall experience of the users. Accordingly, the
analysis is presented in a bi-dimensional manner in the fol-
Different Embodiment Types of VA’s lowing sections.
Smart speakers are the mostly used embodiment of VA’s Context of Use
used in our selected articles. This is due to the current
popularity of commercial smart speakers such as Alexa, Independent Variable We split the context of use into an
HomePod, etc. A 2019 study showed that 35% of US independent variable and the techniques used. The inde-
SN Computer Science
Table 3 List of compiled articles

# Article name Voice assistant type Usability measure Years
1 An Exploration of Speech-Based Productivity Car Interface Effectiveness 2019

Support in the Car [52]
Exploring Effects of Conversational Fillers on Smart Speaker Effectiveness, machine Voice(perceived intel- 2019
2 User Perception of Conversational Agents ligence)
[53]
3 I Almost Fell in Love with a Machine”: Speak- Software Interface Trust 2019
ing with Computer Affects Self-disclosure
[54]
4 Clarifying False Memories in Voice-based Smart Speaker Satisfaction, efficiency, cognitive load 2019
Search [55]
5 The Effects of Anthropomorphism and Non- Smart Speaker Humanoid machine Voice(perceived humanness, social 2019
verbal Social Behavior in Virtual Assistants presence), cognitive load(attention)
[56]
6 An End-to-End Conversational Style Matching Smart Speaker Trust 2019
Agent [57]
7 Tandem Track: Shaping Consistent Exercise Smart Speaker Software Interface Efficiency, Effectiveness 2020
Experience by Complementing a Mobile
App with a Smart Speaker [58]
8 Mapping Perceptions of Humanness in Intel- Smart speaker Software Interface Machine voice(perceived Humanness), Effec- 2019
ligent Personal Assistant Interaction [59] tiveness
9 Pattern of Gaze in Speech Agent Interaction Humanoid Machine voice (Social presence), cognitive 2019
[60] workload
10 Conversational Interfaces for a Smart Campus: Smart Speaker Software Interface Effectivity 2020
A Case Study [61]
11 Mental Workload and Language Production in Smart Speaker Software Interface Cognitive Load, Satisfaction 2020
Non-Native Speaker IPA Interaction [62]
12 User Experience of Alexa when controlling Smart speaker User satisfaction 2020
music – comparison of face and construct
validity of four questionnaires [63]
13 Machine Body Language: Expressing a Smart Humanoid Machine Voice (Perceived humanness) 2020
Speaker’s Activity with Intelligible Physical
Motion [64]
14 Measuring the anthropomorphism, animacy, Humanoid Machine Voice (Perceived humanness, Anthro- 2008
likeability perceived intelligence and per- pomorphism)
ceived safety of robots [65]
15 At Your Service: Designing Voice Assistant Car interface Attitude(Likeability, acceptance) 2019
Personalities to Improve Automotive [66]
16 Hey, Siri”, “Ok, Google”, “Alexa”. Accept- Smart Speaker Software Interface Attitude (Trust acceptance) 2019
ance- Relevant Factors of Virtual Voice-
Assistants [67]
17 User experience with smart voice assistants: Smart Speaker Software Interface User satisfaction 2019
the accent perspective [68]
18 Empathy is all you need: How a conversational Software Interface Effective, User satisfaction,Machine voice 2020
agent should respond to verbal abuse [69] (social presence)
19 Gendered Voice and Robot Entities: Percep- Humanoid User satisfaction, attitude, effectiveness 2009
tions and Reactions of Male and Female
Subjects [70]
20 What If Conversational Agents Became Invis- Smart speaker Attitude(trust), machine 2020
ible? Comparing Users’ Mental Models Voice(Anthropomorphism)
According to Physical Entity of AI Speaker
[71]
21 Similarity is more important than expertise: Smart Speaker Effectiveness, attitude(Trust), Efficiency, 2007
Accent effects in speech interfaces [72] satisfaction
22 Can Computer-Generated Speech Have Software Interface Attitude, User satisfaction 2020
Gender? An Experimental Test of Gender
Stereotype [73]
SN Computer Science
Table 3 (continued)
# Article name Voice assistant type Usability measure Years
23 Designing Social Presence of Social Actors in Software Interface satisfaction 2003

Human Computer Interaction [74]
24 Improving Automotive Safety by Pairing Software Interface Effectiveness, Efficiency 2005
Driver Emotion and Car Voice Emotion [74]
25 Designing Emotional Expressions of Conver- Humanoid Cognitive load, Attitude 2018
sational States for Voice Assistants: Modality
and Engagement [75]
26 The Use of Voice Input to Induce Human Smart Speaker Attitude 2018
Communication with Banking Chabot’s [76]
27 Face Value? Exploring the Effects of Embodi- Smart Speaker Software Interface Attitude 2018
ment for a Group Facilitation Agent [77]
28 Trust in artificial voices: A “congruency Humanoid Attitude 2018
effect” of first impressions and behavioral
experience [78]
29 Children Asking Questions: Speech Interface Smart Speaker Cognitive load 2018
Reformulations [79]
ple (user attributes), voice (voice assistant attributes), task,

conversational style, and anthropomorphic cues. The voice
and people categories are the oldest independent variables
used to measure usability. Their relevance is also seen in the
recent studies, which indicate that researchers have a high
interest in correlating users with the VA’s. On the other
hand, anthropomorphic clues and conversational styles are
relatively new to the measurement of usability. The task-
independent variable is the most used variable of late, per-
haps because users always test the VA’s ability to perform
certain tasks. It also indicates that VA’s are widely used for
various functional and utilitarian aspects. The anthropomor-
Fig. 2 Year of publication of selected articles phic cues are seldom used in the second phase (2007–2014).
However, it is most widely used in the last range (2015–
2021).
In Table 4 we highlight more details with regards to
the different groups of the independent variable collected,
and also present examples of the independent variables for
each category. We highlight how the independent vari-
ables have been applied by the previous studies and in
which environment they have been used. We defined each
independent variable category in Table 4, and explained
their sub-categories as well. As evident from Table 4, dif-
ferent independent variables are used together in multiple
studies. For example, independent voice variables and
Fig. 3 Embodiment of Voice assistant used in selected studies independent people variables are used simultaneously in
various studies, such as personality, gender, and accent.
Similarities between multiple independent variables aid
pendent variables presented in our study are the physical to understand the relationship between the variables
and mental attributes used to measure a given user inter- themselves and their relationship with the usability meas-
action outcome. Furthermore, our study grouped the inde- ures. Furthermore, the table also highlights the kind of
pendent variables into five main categories. The grouping is experiments carried out. Controlled experiments are effec-
shown in Fig. 4 and is based on the similar themes identified tive methods for understanding the immediate cause and
from the collected studies. The five groups included peo- effect between variables. However, a noticeable drawback
SN Computer Science
Fig. 4 Categories of independ-

ent variable use over the years
of controlled experiments is the absence of external valid- with which users achieve specified goals.”, Whereas “Effi-
ity. The results might not be the same when applied in ciency is the resources expended concerning accuracy and
real-world settings. For instance, the simulation experi- completeness in which users achieve goals” and “satisfac-
ment on cars is a controlled environment, a driver has no tion is the freedom from discomfort and positive attitudes
control over the domain in real life. The usability experi- towards the use of the product” [18].
ence of the driver might be different in natural settings and In numerous studies, the usability measures used were
that might sometimes prove fatal. clearly outside the scope of the ISO 9241-11 framework.
In total, we identified three additional usability categories
Techniques Used We identified seven techniques that attitude, machine voice (anthropomorphism), and cognitive
researchers have used as shown in Fig. 5. The quantitative load. The graphical representation of the different usabil-
experiments are the most used and the oldest technique ity measures identified in this study is presented in Figs. 6
used on voice assistants based on our data collected. The and 7. Futhermore, the figures also highlights the percent-
quantitative method is sometimes used as a standalone age of studies that used the mentioned usability measures
experiment and sometimes with other techniques [54]. It is in the ISO 9241-11 framework and those that are outside
worthy of notice that cars simulation experiments involv- the framework. Based on our compiled result, the user sat-
ing VA’s were first used in 2000. Other experiments on isfaction and effectiveness are the earliest usability meas-
human communication with self-driving cars have been ures used when measuring VA’s usability. Some studies
carried out since 1990’s. making it one of the oldest tech- used performance and productivity as subthemes to meas-
niques for usability measurement. More accurate tech- ure effectiveness [62]. The measure of usability has been
nique was introduced later, such as the interaction design. carried out both subjectively and objectively. For instance,
The interaction design employed by studies such as [61] studies have measured the VA effectiveness by subjective
provides a real-time experiment scenario. This avoids the means by using quantitative methods such as questionnaire
drawback such as bias when using quantitative methods. tools [72]. In contrast, other studies have used objective
Factorial design studies are majorly used by studies that methods such as average completed interaction [69]. Mul-
compare two or more entities in a case study [55]. They tiple usability measures are sometimes applied in the same
are utilized mainly by studies using two or more inde- research; for instance some studies measured effectiveness
pendent variables together. alongside efficiency and satisfaction [66, 70]. Learnability,
optimization, and ease of use have been used as subthemes
Usability Measure (Dependent Variable) to measure efficiency. Interactive design is the most effec-
tive experiment that provides real-time results employed [56,
This subsection of our study focuses on the usability meas- 79]. The ISO 9241-11 framework works well with effec-
urement of our research. Moreover, the findings are used to tiveness, efficiency, and satisfaction; however, the users
answer RQ1 and RQ3. The ISO 9241-11 framework grouped have more expectations from the voice assistant with the
usability measures into three categories; effectiveness, effi- recent advancement of VA capabilities. Our compiled result
ciency, and satisfaction. According to the ISO 9241-11 showed that more than half of the studies are not carried out
framework, “effectiveness is the accuracy and completeness in accordance with the standard ISO 9241-11 framework
SN Computer Science
Table 4 Independent variables and their categorization
Category definition Independent variable Instances Applications Environment
Voice Voice personalities (Energetic vs Subdued), (Introvert A study Paired the driver's emotions Simulation Experiment, Controlled
The voice category comprised of and extrovert) with that of the Car Voice Emotion Environment,
independent variables that are state (Energetic and Subdued) to
associated with the voice assis- test the effectiveness of similarity
tants, these are attributes that the between voice and user personality
voice assistants possess) [68]. Another study showed that a
voice personality that uses a simi-
lar personality like the user creates

more social presence [74]
Voice gender Male vs Female Studies compared different gender Controlled Environment, Free real
voices (Male and Female) to meas- environment
ure social interaction and trust.
Studies showed male voice has a
more dominating effect on users
than female voice [54, 70, 74]
Voice Accent Standard Southern British English Participants create trust expectancy Controlled Environment,
accent VS Liverpool accent Vs based on the voice accent. The
Birmingham accent Vs synthetic participants tend to trust informa-
voice, American Accent vs Swed- tion with a similar accent, more
ish Accent, Native English speaker knowledgeable, sophisticated voice
vs non-English speaker Assistants [68][68]
People People gender Male vs Female Studies showed that Males and Controlled Environment, Free real
The people category comprised of females view voice assistants environment, mixed environment
independent variables that are differently in different aspects.
associated with the users, these are Both genders have different takes
attributes that the users can have in the form of embodiment of the
voice assistants. Moreover, women
trust voice assistants with a female
voice. However, in a situation
where there is a need for convinc-
ing the male voice assistants is
more efficacious [67]
Personality Introvert vs Extrovert, happy vs A study showed that a person's Simulation Experiment,
upset emotional state or personality Controlled Environment,
could be affected the personality
of the voice assistant. [74] A study
showed that social presence is
created when a person uses a voice
assistant with a similar personal-
ity [54]
Page 13 of 23 267
SN Computer Science
Table 4 (continued)
267
Query expression Abuse (Insult, Threat, Swearing) A study instructed the user to Controlled Environment
insult the voice assistants while
communicating with it, and the
Page 14 of 23
VA's response affected the user's

outlook and involved usability [69]
SN Computer Science
Experience UX metric, Self-Efficacy The study Measured the user face Controlled Environment, survey
validity and construct validity by
correlating UX scores of question-
naires with each other. Another
study shows that Participant
self-efficacy and experience affect
the trust, privacy and language
performance of the Voice Assistant
[74]
Voice accent American Accent vs Swedish Participants tend to trust information Controlled Environment, Free real
Accent, Native English speaker vs with a similar accent, then more environment, mixed environment
non-English speaker knowledgeable content, English
native speakers do exhaust more
mental models when interacting
with voice assistants [52, 68, 72]
Task characteristic Modality Voice mode, Textual mode, VA A study used modality to test the emotional expression design experi-
The Task characteristic Comprised Facial Expression mode. (Smiley) social presence of the VA. The ment interactive task, controlled
of independent variables that are Mixed Interface study shows participants feel a Environment,
associated with tasks that it’s strong social presence when tex-
expected the user to carry out tual modality personality matches
during the interaction, this also the voice personality. Another
include the modality of the task study showed that nonverbal
emotional expressions such as
Text box movement and VA Facial
Expression mode (Smiley) affect
user engagement [57]
Context Interactive Task, Drawing Task, A study used the speech to text as a emotional expression design experi-
Executable Task, Driving simula- task on users during driving. The ment interactive task, controlled
tion task, auditory Task Control- study measured driver engagement Environment, free real life environ-
ling device Volume, audio speech and concentration during driving. ment, simulation
to text [52]Another study used the game
theory concept on the users and
asked the users to trust the VA in
an investment scheme, where the
users have a different opinion on
what VA to trust [76]
Table 4 (continued)
Conversational Style Response type Empathetic (Avoidance vs Empathy The VA response affects the user Controlled Environment,
This is the nature of the conversa- vs Counterattack) usability experience, and A
tion from either the user during Clarifying Query (No modification Study showed that When VA are
query or the response of the voice vs direct Modification vs nega- insulted, their response type affects
assistants tively clarified) the participant's emotional engage-
Conversational Fillers (“um”,huh, ment and attitude [69]. Another
uh) study showed that when VA has
more information on a query, the
follow-up question affects user

engagement and efficiency[55] A
study showed that Conversational
fillers increase social interaction
with the voice Assistant [53]
Communication Form High Consideration(indirect) VS A study used participant High Controlled Environment,
High Involvement (direct) Consideration and High involve- Mixed Environment
ment linguistic style to realize.
It is effective when used with a
similar voice assistant's linguistic
style. [57]
Anthropomorphic cues Speech agents’ Personification vs Speech agent personalization A study compared VA personation, Controlled Environment,
These are independent variables on personalization, and neither to
voice assistants that exhibit human measure users' trust and engage-
attributes and intelligence, this ment when used by children and
make the user perceive the voice adults. The result showed the
assistants as human personalized VA has the highest
concentration and trust [79]
Embodiment type (audio Vs smart speaker), or (gaze vs no gaze), human- Numerous studies have used embod- Controlled Environment,
oid robot, Smart Speaker vs Anthropomorphic Robot (AMR). vs The iment type to measure usability;
Anthropomorphic Social Robot (AMSR) a study compared a VA with gaze
with another VA without gaze to
measure the user anthropomor-
phism. Another study compared
physical smart speakers with the
absence of speakers but just voice
to test the user trust and engage-
ment [56, 60, 67, 78]
Page 15 of 23 267
SN Computer Science
Fig. 5 Technique used in our

studies over the span period of
time
Fig. 6 Usability measurement

used over the years on our com-
piled articles
(Fig. 7). The other usability measures we identified outside

the ISO 9241-11 framework are attitude, machine voice, and
cognitive load.
Attitude is a set of emotions, beliefs, and behavior
towards the voice assistants. Attitude results from a person’s
experience and can influence user behavior. Attitude is sub-
jected to change and is not constant. Understanding the user
attitude towards the VA has become an active research area.
Numerous studies have used different methods to measure
subthemes of attitude such as trust, closeness, disclosure,
smartness, and honesty [60, 78]. Likeability is also a sub-
theme of attitude, and it has been used to measure the com- Fig. 7 Percentage of ISO 9241–11 framework usability measures and
patibility, trust, and strength between the user and VA’s [56, non ISO 9241–11
SN Computer Science
57]. Moreover, embodiment type affects the user attitude as Discussion

well, A study highlighted how gaze affects the user attitude
toward VA [59], and it shows VA with gaze creates trust. Independent Variable and Usability Measures
We defined machine voice (anthropomorphism) as the
user attribution of human characteristics and human simi- Our study revealed what has been previously carried out
larity to the voice assistant. We considered machine voice in VA usability and revealed the gaps that are yet to be
an important usability measure that only applies to voice addressed. We analyzed the usability measures and their
assistants due to their primary modality being the voice. relationship to the so-called independent variables. There is
Considering that fact, the measure of machine voice has also an easy accessibility to VA’s due to the development of dif-
spiked currently it becomes obvious that it has been drawing ferent embodiment types such as speakers, humanoids, and
a lot of interest. One of the direct purposes of the VA is to robots. However, there is so much less focus on embodiment
sound as humanly as possible. When the users will perceived types and their relationship to effectiveness and anthropo-
the machines to be more human, it built more trust, which morphism, which needs more attention. Some relationship
will result in a better usability experience. gaps and associations are apparent, while some are vague.
The cognitive load might be mistaken for efficiency. Nev- For instance, the independent variable “accent”, has often
ertheless, they are different. We defined cognitive load as the been connected with its effectiveness on users. However,
amount of mental capacity a person applies to communicate what is left unanswered is if the VA accents impart the same
successfully with the VA. When it comes to VA, actions efficacy on users of the same or different genders. Another
such as giving out commands require cognitive thinking notable gap is gender and efficiency, with very few studies
and approach. The cognitive load is measured by specific on that. This will be an essential aspect to understand and
characteristics unique to the VA, such as attention time dur- apply with the recent massive adoption of voice assistants in
ing the use of the VA [76] and the user’s mental workload different contexts. Another obvious gap is the query expres-
during use [77]. sion relationship with any ISO 9241-11 framework meas-
To answer RQ1 (can the ISO 9241–11 framework be used ures. The query expression is how a user expresses their
to measure the usability of the VA’s?), none of the existing query to the voice assistants. The query expression has been
works have used the ISO 9241-11 framework solely for the known to increase the trust and attitude of the user towards
purpose of usability evaluation. It has been supplemented by the VA. However, its relationship to usability measures such
other factors that we have presented above that are outside as efficiency, satisfaction, and effectiveness is still under-
the scope of this framework. researched. Knowing the right way to ask queries (ques-
tions) defines the type of response a user gets. An incorrect
response will be received if the right question is expressed
Relationship Between the Independent variables incorrectly. From a mental model, when a user has too much
and Usability Measures energy and thought to frame a question, it affects the VA effi-
ciency and satisfaction. However, this has not been proven
After identifying the independent and dependent variables, by any current study.
in Tables 5 and 6 we show how they are inter-related for The VA response types increase effectiveness and trust.
having a better understanding of the usability scenario of However, its relationship to user acceptance is still unknown.
the VA’s. While Table 5 focuses on the ISO 9241-11 specific Another exciting intersection is the anthropomorphic cues
factors, Table 6 considers the non-ISO factors specifically. and attitude, which results from anthropomorphic emo-
The independent variables are grouped into categories tional response than a practical one. Attitude is an emotional
and represented by table rows, with every category consist- response to a giving state, hence its strong connection with
ing of multiple independent variables. Moreover, the usabil- anthropomorphism. The attitude toward the VA is a highly
ity measures have been presented in the column of the table. researched area [86]. Trust, likeability, and acceptance are
Every usability measure is made up of different sub-themes, subthemes that focused on the attitude usability measure.
which are all presented on the table as well. The tables high- This can be attributed to the importance of trust while using
light the relationship between the independent and usability emergent technologies such as voice assistants. User trust
measures. An “X” mark present in each cell represents a in voice assistants is an essential aspect with the rise of IoT
study present between that independent variable and usabil- devices, and user mistrust affects the acceptance and effec-
ity measure subtheme. Nonetheless, an empty cell indicates tiveness of the VA’s [87]. Multiple studies measured user
that there is no study carried out to link that relationship trust while using machine voice categories as an independ-
between the usability measure and independent variable. ent variable. That could be attributed to the lack of GUI
in VA. Furthermore, the voice modality must be enough
SN Computer Science
267
Page 18 of 23
SN Computer Science
Table 5 Relationship between independent variables and ISO 9241–11 framework measurement
Dependent variables Effectivity Efficiency Satisfaction
Independent variables Productivity Performance Value Learnability Optimization Ease of use Feasibility Decision User experi- Continued Conformity
making ence use
Voice Assistants
Personalities x x x x x
Gender x x x x x
Accent x x x x x
People
Gender x x x x
Personality x x x
Query expression
Experience x x x
Accent x x x x x x
Task characteristics
Modility x x x x x x x
Context x x
Communication type
Response type x x x x x
Conversational type x x x x x x
Anthropomorphism
Embodiement type x x x x x x
Humanoid/robott x x
Smart Speaker, x x x x x x
Robot,Anthropormorphic
Robot
Table 6 Relationship between independent variables and non- ISO 9241–11 framework measurement
Dependent vari- Attitude Machine voice Cognitive load
ables
Independent vari- Trust Likeability Acceptance Perceived intel- Perceived human- Social presence Mental workload Attention
ables ligence ness
Voice assistant
Personalities x x x x
Gender x x x x x
Accent x x x x x x
People
Gender x x x
Personality x x x x x
Query expression x x
Experience x x x x
Accent x x x x x x
Task Character-
istics
Modality x x x x x x x
Context x x x x
Communication
Type
Response type x x x x x x
Conversational x x x x x x
type
Anthropomorphism
Embodiment type x x x x x x x
Humanoid/robot x x x x x
Smart speaker, x x x x
robot, anthropo-
morphic robot
Page 19 of 23 267
SN Computer Science
to cultivate user trust. Noticeably subjective methods were Future Works and Limitation
widely employed when measuring the user attitudes; even
though subjective measures often relate to the variables they One limitation in our study was using a few databases as
are intended to capture; however, they are also affected by our articles source; in future studies, we intend to add more
cognitive biases. journal databases such as Scopus, and Taylor and Francis.
The ISO 9241-11 framework is an effective tool when The majority of the experiment studies we collected was
measuring effectiveness, efficiency, and satisfaction. How- conducted in a controlled environment; future studies will
ever, it is not applicable when measuring usability’s, such as focus on usability measures and independent variables, that
attitude, machine voice, and mental load. These are all meas- are used in natural settings; furthermore, the results can be
urements that are uniquely associated with voice assistants. compared together More studies should be carried out on
Therefore, the ISO 9241-11 framework could be expanded objective techniques, also how they could cooperate with
to include such usability aspects. subjective techniques. This is vital because, with the rise of
user expectations of voice assistants, it will be essential to
Technique Employed understand how techniques complement each other in each
usability measurement.
The factorial design adapts well when used in a matched
subject design experiments [56]. Based on the studies col-
lected, machine learning is not well used as an analytic tool Conclusion
in usability. This could be attributed to the technical aspects
of machine learning and it is still relatively a new field. How- Our study aimed to understand what is currently employed
ever, with machine learning third-party tools more analysis for measuring voice assistant usability, and we identified the
will be carried out. Wizard of Oz, and interactive design different independent variables, dependent variables, and the
started gaining popularity in 2015–2021. Moreover, the techniques used. Furthermore, we also focused on using the
Wizard of Oz and interactive techniques are more effective ISO 9241-11 framework to measure the usability of voices
when using independent variables such as anthropomorphic assistants. Our study classified five independent variable
cues. The anthropomorphic cue independent variables is classes used for measuring the dependent variables. These
used with Wizard of Oz. techniques and interaction design separate classes were categorized based on the similarities
more than any other techniques. This could be recognized to between the member groups. Also, our study used the three
the importance of using objective methods to avoid biased usability measures in the ISO 9241-11 framework in con-
human responses. Furthermore, “machine voice” is a fairly junction with the other three to serve as the dependable vari-
popular usability measure. This could be attributed to the ables. We uncovered that voice assistants such as car inter-
VA developers trying to give the VA a more human and face speakers were not studied enough, and currently, smart
intelligent attributes. The more users perceive the machine speakers have the most focus. Dependent variables such as
voice as intelligent and humanlike, the more they trust and machine voice (anthropomorphism) and attitude recently
adopt it. More objective technique methods should be cre- have more concentration than the old usability measures,
ated and used on the independent variables when measuring such as effectiveness. We also uncovered that usability is
machine voice. Subjective techniques such as Quantitative dependent on the context of use, such as the same independ-
methods are easy to use and straightforward. However, they ent variables could be used in different usability measures.
can produce biased results. Our study highlights the relationship between the independ-
Interactive design experiments are the most commonly ent and dependent variables used by other studies. In conclu-
used technique employed to measure the usability. How- sion, our study used the ISO 9241-11 to analyse usability.
ever, the interaction depends on voice modality, making it We also highlight what has been carried out on VA’s usabil-
different from the traditional interaction design that uses ity and what gaps are left. Moreover, we concluded even
visual cues as part of its essential components. Moreover, though there is a lot of usability measurement carried out,
interaction design also triggers an emotional response, there are still many aspects that have not been researched.
which makes it effective when measuring user attitude. The Furthermore, the current ISO 9241-11 framework is not suit-
absence of visual elements in interactive design used might able for measuring the recent advancement of VA because
debatably defeat the purpose of clear communication. A new the user needs and expectation have changed with the rise
standard of interaction design uniquely for voice modality of technology. Using the ISO 9241-11 framework will cre-
should be done. ate ambiguity in explaining some usability measures such
as machine voice, attitude and cognitive load. However, it
has the potential to be a foundation for future VA usability
frameworks.
SN Computer Science
15. Pal D, Zhang X, Siyal S. Prohibitive factors to the acceptance of

Internet of Things (IoT) technology in society: a smart-home con-
Funding This study was funded by The Asahi Glass Foundation. text using a resistive modelling approach. Technol Soc. 2021;66:
101683.
16. Murad C, Munteanu C. “I don't know what you're talking about,
Declarations HALexa" the case f or voice user interface guidelines. In: Pro-
ceedings of the 1st International Conference on Conversational
Conflict of Interest The author declares that they have no conflict of User Interfaces,2019; pp. 1–3.
interest. 17. Budiu R, Laubheimer P. Intelligent assistants have poor usability:
a user study of Alexa, Google assistant, and Siri. Nielsen Norman
Group; 2018. Available online at https://www.nngroup.com/artic
les/intelligentassistant-usability/ (last accessed 4/12/2019).
18. Murphy CN, Yates J. The international organization f or stand-
References ardization (ISO): global governance through voluntary consensus.
Routledge; 2009.
1. Hoy MB. Alexa, Siri, Cortana, and more: an introduction to v oice 19. ISO 9241-11. Ergonomic requirements for office work with visual
assistants. Med Ref Serv Quart. 2018;37(1):81–8. display terminals (VDTs)—Part II guidance on usability; 1998.
2. Zwakman DS, Pal D, Arpnikanondt C. Usability evaluation of 20. Weichbroth P. Usability attributes revisited: a time-framed knowl-
artificial intelligence-based voice assistants: the case of ama- edge map. In 2018 Federated Conference on Computer Science
zon Alexa. SN Comput Sci. 2021. https://d oi.o rg/1 0.1 007/ and Information Systems (FedCSIS) (pp 1005–1008). IEEE; 2018.
s42979-020-00424-4. 21. Petrock V. Voice assistant and smart speaker users 2020. Insider
3. Segi H, Takou R, Seiyama N, Takagi T, Uematsu Y, Saito H, Intelligence; 2020. Retrieved November 22, 2021, from https://
Ozawa S. An automatic broadcast system for a weather report www.emarketer.com/content/voice-assist ant-and-smart-speak
radio program. IEEE Trans Broadcast. 2013;59(3):548–55. er-users-2020
4. Noel S. Human computer interaction (HCI) based Smart Voice 22. Pinelle D, Wong N and Stach T. Heuristic evaluation f or games:
Email (Vmail) Application—Assistant f or Visually Impaired usability principles for video game design. In: Proceedings of
Users (VIU). In: 2020 third international conference on smart SIGCHI Conference on Human Factors in Computing Sy stems
systems and inventive technology (ICSSIT) (pp 895–900). IEEE; (2008); 2008, pp. 1453–1462. https://doi.org/10.1145/1357054.
2020. 1357282.
5. Sangle-Ferriere M, Voyer BG. Friend or foe? Chat as a dou- 23. Sutcliffe A, Gault B. Heuristic evaluation of virtual reality appli-
ble-edged sword to assist customers. J Serv Theory Pract. cations. Interact Comput 16. 2004;4:831–49. https://doi.org/10.
2019;29:438–61. 1016/j.intcom.2004.05.00.
6. Lugano G. Virtual assistants and self-driving cars. In: 2017 15th 24. Sharif K, Tenbergen B. Smart home voice assistants: a literature
International Conf erence on ITS Telecommunications (ITST) (pp survey of user privacy and security vulnerabilities. Complex Syst
1–5). IEEE; 2017. Inform Model Quart. 2020;24:15–30.
7. Rybinski K, Kopciuszewska E. Will artif icial intelligence rev 25. de Barcelos Silva A, Gomes MM, da Costa CA, da Rosa Righi
olutionise the student evaluation of teaching? A big data study of R, Barbosa JLV, Pessin G, et al. Intelligent personal assistants: a
1.6 million student reviews. Assessment & Evaluation in Higher systematic literature review. Expert Syst Appl. 2020;147: 113193.
Education; 2020, pp. 1–13 26. Gubareva R and Lopes RP. Virtual assistants for learning: a sys-
8. Tankovska H. Number of digital voice assistants in use worldwide tematic literature review. In: CSEDU (1); 2020, pp. 97–103.
2019–2024 (in billions), 2020. https://w ww.s tatis ta.c om/s tatis tics/ 27. Bérubé C, Schachner T, Keller R, Fleisch E, Wangenheim F,
973815/worldwide-digital-voice-assistant-in-use/, (accessed 17 Barata F, Kowatsch T. Voice-based conversational agents for the
Nov 2021) prevention and management of chronic and mental health condi-
9. Pal D, Arpnikanondt C, Funilkul S, Chutimaskul W. The adoption tions: systematic literature review. J Med Internet Res. 2021;23(3):
analysis of voice-based smart IoT products. IEEE Internet Things e25933.
J. 2020;7(11):10852–67. 28. Chi OH, Gursoy D and Chi CG. Tourists’ attitudes toward the use
10. Zwakman DS et al. Voice usability scale: measuring the user expe- of artificially intelligent (AI) devices in tourism service delivery:
rience with voice assistants. In: 2020 IEEE International Sympo- moderating role of service value seeking. J Travel Res. 2020;
sium on Smart Electronic Sy stems (iSES)(Formerly iNiS). IEEE; 0047287520971054.
2020. 29. Kim S. Exploring how older adults use a smart speaker-based
11. Coronado E, Deuff D, Carreno-Medrano P, Tian L, Kulić D, voice assistant in their first interactions: qualitative study. JMIR
Sumartojo S, et al. Towards a modul ar and distributed end-user Mhealth Uhealth. 2021;9(1): e20427.
dev elopment framework f or human-robot interaction. IEEE 30. Coursaris CK, Kim DJ. A meta-analytical review of empirical
Access. 2021;9:12675–92. mobile usability studies. J Usability Stud. 2011;6(3):117–71.
12. Maguire M. Development of a heuristic evaluation tool f or voice 31. Goh ASY, Wong LL, Yap KYL. Evaluation of COVID-19 infor-
user interf aces. In: International conference on human-computer mation provided by digital voice assistants. Int J Digital Health.
interaction. Cham: Springer; 2019. p. 212–25. 2021;1(1):3.
13. Fulfagar L, Gupta A, Mathur A, Shrivastava A. Development 32. Rapp A, Curti L, Boldi A. The human side of human-chatbot inter-
and evaluation of usability heuristics for voice user interfaces. action: a systematic literature review of ten years of research on
In: International conference on research into design. Singapore: text-based chatbots. Int J Hum-Comput Stud. 2021;151: 102630.
Springer; 2021. p. 375–85. 33. Seaborn K, Miyake NP, Pennefather P, Otake-Matsuura M. Voice
14. Nowacki C, Gordeeva A, Lizé AH. Improving the usability of in human-agent interaction: a survey. ACM Comput Surv (CSUR).
voice user interfaces: a new set of ergonomic criteria. In: Inter- 2021;54(4):1–43.
national conference on human-computer interaction. Cham: 34. Castro JW, Ren R, Acuña ST and Lara JD. Usability of chatbots:
Springer; 2020. p. 117–33. a systematic mapping study; 2019.
SN Computer Science
35. Bhirud N, Tataale S, Randive S, Nahar S. A literature review Computing Systems (CHI EA ’19). 2019; 1–6. https://doi.org/10.
on chatbots in healthcare domain. Int J Sci Technol Res. 1145/3290607.3312913.
2019;8(7):225–31. 55. Yu Q, Nguyen T, Prakkamakul S and Salehi N. “I almost fell
36. Ahmad NA, Che MH, Zainal A, Abd Rauf MF, Adnan Z. in love with a machine”: speaking with computers affects self-
Review of chatbots design techniques. Int J Comput Appl. disclosure. In: Extended Abstracts of the 2019 CHI Conference
2018;181(8):7–10. on Human Factors in Computing Systems (CHI EA ’19). 2019;
37. Gentner T, Neitzel T, Schulze J and Buettner R. A Systematic lit- pp. 1–6. https://doi.org/10.1145/3290607.3312918
erature review of medical chatbot research from a behavior change 56. Kiesel J, Bahrami A, Stein B, Anand A, and Hagen M. Clarify-
perspective. In: 2020 IEEE 44th annual computers, software, and ing false memories in voice-based search. In: Proceedings of the
applications conference (COMPSAC). IEEE; 2020, pp. 735–740. 2019 Conference on Human Information Interaction and Retrieval
38. Cunningham-Nelson S, Boles W, Trouton L and Margerison E. A (CHIIR ’19). 2019; 331–335. https://doi.org/10.1145/3295750.
review of chatbots in education: practical steps forward. In: 30th 3298961.
Annual Conference for the Australasian Association for Engi- 57. Kontogiorgos D, Pereira A, Andersson O, Koivisto M, Rabal EG,
neering Education (AAEE 2019): Educators Becoming Agents Vartiainen V and Gustafson J. The effects of anthropomorphism
of Change: Innovate, Integrate, Motivate. Engineers Australia; and non-verbal social behavior in virtual assistants. In: Proceed-
2019, pp. 299–306. ings of the 19th ACM International Conference on Intelligent Vir-
39. Van Pinxteren MM, Pluymaekers M, Lemmink JG. Human-like tual Agents (IVA ’19). 2019; 133–140. https://doi.org/10.1145/
communication in conversational agents: a literature review and 3308532.3329466
research agenda. J Serv Manag. 2020;31:203–25. 58. Hoegen R, Aneja D, McDuff D and Czerwinski M. An end-to-end
40. Weichbroth P. Usability attributes revisited: a time-framed knowl- conversational style matching agent. In: Proceedings of the 19th
edge map. In: 2018 Federated Conference on Computer Science ACM International Conference on Intelligent Virtual Agents (IVA
and Information Systems (FedCSIS). IEEE; 2018, pp. 1005–1008. ’19). 2019; 111–118. https://doi.org/10.1145/3308532.3329473
41. Bevan N, Carter J, Earthy J, Geis T, Harker S. New ISO stand- 59. Luo Y, Lee B and Choe EK. TandemTrack: shaping consistent
ards for usability, usability reports and usability measures. In: exercise experience by complementing a mobile app with a smart
International conference on human-computer interaction. Cham: speaker. In: Proceedings of the 2020 CHI Conference on Human
Springer; 2016. p. 268–78. Factors in Computing Systems (CHI ’20). 2020; 1–13. https://d oi.
42. Moumane K, Idri A, Abran A. Usability evaluation of mobile org/10.1145/3313831.3376616
applications using ISO 9241 and ISO 25062 standards. Springer- 60. Doyle PR, Edwards J, Dumbleton O, Clark L and Cowan BR.
plus. 2016;5(1):1–15. Mapping perceptions of humanness in intelligent personal assis-
43. Yahya H, Razali R. A usability-based framework for electronic tant interaction. In: Proceedings of the 21st International Confer-
government systems development. ARPN J Eng Appl Sci. ence on Human-Computer Interaction with Mobile Devices and
2015;10(20):9414–23. Services (MobileHCI ’19). 2019. https://doi.org/10.1145/33382
44. Alva ME, Ch THS, López B. Comparison of methods and existing 86.3340116.
tools for the measurement of usability in the web. In: International 61. Jaber R, McMillan D, Belenguer JS and Brown B. Patterns of
conference on web engineering. Berlin: Springer; 2003. p. 386–9. gaze in speech agent interaction. In: Proceedings of the 1st Inter-
45. He X, Persson H, Östman A. Geoportal usability evaluation. Int J national Conference on Conversational User Interfaces - CUI ’19
Spatial Data Infrastruct Res. 2012;7:88–106. (the 1st International Conference). 2019; 1–10. https://doi.org/10.
46. Dietlein CS, Bock OL. Development of a usability scale based 1145/3342775.3342791.
on the three ISO 9241–11 categories “effectiveness, ”effi- 62. Bortoli M, Furini M, Mirri S, Montangero M and Prandi C. Con-
cacy” and “satisfaction”: a technical note. Accred Qual Assur. versational interfaces for a smart campus: a case study. In: Pro-
2019;24(3):181–9. ceedings of the international conference on advanced visual inter-
47. Nik Ahmad NA and Hasni NS. ISO 9241–11 and SUS measure- faces (AVI ’20). 2020. https://doi.org/10.1145/3399715.3399914.
ment for usability assessment of dropshipping sales management 63. Wu Y, Edwards Y, Cooney O, Bleakley A, Doyle PR, Clark L,
application. In: 2021 10th International Conference on Software Rough D and Cowan BR. Mental workload and language pro-
and Computer Applications. 2021; pp. 70–74. duction in non-native speaker IPA interaction. In: Proceedings of
48. Kitchenham B. Procedures f or perf orming systematic reviews. the 2nd Conference on Conversational User Interfaces (CUI ’20).
Keele Univ. 2004;33(2004):1–26. 2020. https://doi.org/10.1145/3405755.3406118
49. Seaborn K, Miyake NP, Pennefather P, Otake-Matsuura M. 64. Brüggemeier B, Breiter M, Kurz M and Schiwy J. User experi-
Voice in human–agent interaction: a survey. ACM Comput Surv ence of Alexa when controlling music: comparison of face and
(CSUR). 2021;54(4):1–43. construct validity of four questionnaires. In: Proceedings of the
50. Al-Qaysi N, Mohamad-Nordin N, Al-Emran M. Employing the 2nd conference on conversational user interfaces (CUI ’20). 2020.
technology acceptance model in social media: a systematic review. https://doi.org/10.1145/3405755.3406122
Educ Inf Technol. 2020;25(6):4961–5002. 65. Machine body language: expressing a smart speaker’s activity
51. Kitchenham B and Charters S. Guidelines f or performing system- with intelligible physical motion. 57
atic literature reviews in software engineering; 2007. 66. Bartneck C, Kulić D, Croft E, Zoghbi S. Measurement instru-
52. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, ments for the anthropomorphism, animacy, likeability, perceived
Mulrow CD CD, The PRISMA, et al. statement: an updated guide- intelligence, and perceived safety of robots. Int J Soc Robot.
line f or reporting sy stematic rev iews. BMJ. 2020;2021(372): 2009;1(1):71–81. https://doi.org/10.1007/s12369-008-0001-3A.
n71. https://doi.org/10.1136/bmj.n71. 67. Braun M, Mainz A, Chadowitz R, Pfleging B and Alt F. At your
53. Martelaro N, Teevan J and Iqbal ST. An exploration of speech- service: designing voice assistant personalities to improve auto-
based productivity support in the car. In: Proceedings of the 2019 motive user interfaces. In: Proceedings of the 2019 CHI Con-
CHI conference on human factors in computing systems. 2019; ference on Human Factors in Computing Systems (CHI ’19),
pp. 1–12 2019;40:1–40:11. https://doi.org/10.1145/3290605.3300270
54. Jeong Y, Lee J and Kang Y. Exploring effects of conversational 68. Burbach L, Halbach P, Plettenberg N, Nakayama J, Ziefle M
fillers on user perception of conversational agents. In: Extended and Valdez AC. “Hey, Siri”, “Ok, Google”, “Alexa”. Accept-
Abstracts of the 2019 CHI Conference on Human Factors in ance-relevant factors of virtual voice-assistants. In 2019: IEEE
SN Computer Science
International Professional Communication Conference (Pro- 78. Shamekhi A, Liao QV, Wang D, Bellamy RKE and Erickson T.
Comm) (ProComm ’19), 2019;101–111. https://doi.org/10.1109/ Face value? Exploring the effects of embodiment for a group
ProComm.2019.00025. facilitation agent. In: Proceedings of the 2018 CHI Conference
69. Pal D, Arpnikanondt C, Funilkul S, and Varadarajan V. User expe- on Human Factors in Computing Systems (CHI ’18),2018;391:1–
rience with smart voice assistants: The accent perspective. In 2019 391:13. https://doi.org/10.1145/3173574.3173965
10th International Conference on Computing, Communication and 79. Torre I, Goslin J, White L and Zanatto D. Trust in artificial voices:
Networking Technologies (ICCCNT ’19), 2019;1–6. https://doi. a “congruency effect” of first impressions and behavioral experi-
org/10.1109/ICCCNT45670.2019.8944754. ence. In Proceedings of the 2018 Technology, Mind, and Society
70. Chin H, Molefi L, and Yi Y. Empathy is all you need: How a con- Conference (TechMindSociety ’18), 2018. Article No. 40. https://
versational agent should respond to verbal abuse. In: Proceedings doi.org/10.1145/3183654.3183691.
of the 2020 CHI Conference on Human Factors in Computing 80. Yarosh S, Thompson S, Watson K, Chase A, Senthilkumar A,
Systems (CHI ’20), 2020; 1–13. https://d oi.o rg/1 0.1 145/3 31383 1. Yuan Y and Brush AJB. Children asking questions: Speech inter-
3376461. face reformulations and personification preferences. In: Proceed-
71. Crowell CR, Villanoy M, Scheutzz M and Schermerhornz P. Gen- ings of the 17th ACM Conference on Interaction Design and Chil-
dered voice and robot entities: Perceptions and reactions of male dren (IDC ’18), 2018;300–12. https://doi.org/10.1145/3202185.
and female subjects. In: Proceedings of the 2009 IEEE/RSJ Inter- 3202207.
national Conference on Intelligent Robots and Systems (IROS 81. Stucker BE, Wicker R. Direct digital manufacturing of integrated
2009), 2009; 3735–3741. https://doi.org/10.1109/IROS.2009. naval systems using ultrasonic consolidation, support material
5354204 deposition and direct write technologies. UTAH STATE UNIV
72. Lee S, Cho M and Lee S. What if conversational agents became LOGAN; 2012.
invisible? Comparing users’ mental models according to physi- 82. Kaplan A, Haenlein M. Siri, Siri, in my hand: Who’s the fairest
cal entity of AI speaker. In: Proc. ACM Interact. Mob. Wearable in the land? On the interpretations, illustrations, and implications
Ubiquitous Technol. 2020; 4, 3. https://doi.org/10.1145/3411840 of artificial intelligence. Bus Horiz. 2019;62(1):15–25.
73. Dahlbäck N, Wang QY, Nass C and Alwin J. Similarity is more 83. Humphry J and Chesher C. Preparing for smart voice assis-
important than expertise: Accent effects in speech interfaces. In tants: Cultural histories and media innovations. New media Soc.
Proceedings of the SIGCHI Conference on Human Factors in 2020;1461444820923679
Computing Systems (CHI ’07), 2007; 1553–1556. https://doi. 84. Moar JS. Cov id-19 and the Voice Assistants Market. Juniper
org/10.1145/1240624.1240859 Research. Retrieved Nov ember 25, 2021, f rom https://www.
74. Lee EJ, Nass C, and Brave S. Can computer-generated speech juniperresearch.com/blog/august-2021/covid-19-and-the-voice-
have gender?: An experimental test of gender stereotype. In Pro- assistants-market
ceedings of the CHI’00 Extended Abstracts on Human factors in 85. Vailshery LS. Topic: Smart speakers. Statista. Retrieved Novem-
Computing Systems (CHI EA ’00), 2000; 289–290. https://doi. ber 25, 2021, from https://www.statista.com/topics/4748/smart-
org/10.1145/633292.633461 speakers/#:~:text=As%20of%202019%20an%20estimated,incre
75. Nass C, Jonsson I-M, Harris H, Reaves B, Endo J, Brave S and ase%20to%20around%2075%20percent
Takayama L. Improving automotive safety by pairing driver emo- 86. Pal D, Vanijja V, Zhang X, Thapliyal H. Exploring the antecedents
tion and car voice emotion. In: CHI ’05 Extended Abstracts on of consumer electronics IoT devices purchase decision: a mixed
Human Factors in Computing Systems (CHI EA ’05), 2005;1973– methods study. IEEE Trans Consum Electron. 2021;67(4):305–18.
6. https://doi.org/10.1145/1056808.1057070. https://doi.org/10.1109/TCE.2021.3115847.
76. Shi Y, Yan X, Ma X, Lou Y and Cao N. Designing emotional 87. Pal D, Arpnikanondt C, Razzaque MA, Funilkul S. To trust or
expressions of conversational states for voice assistants: Modal- not-trust: privacy issues with voice assistants. IT Professional.
ity and engagement. In: Extended Abstracts of the 2018 CHI 2020;22(5):46–53. https://doi.org/10.1109/MITP.2019.2958914.
Conference on Human Factors in Computing Systems (CHI ’18),
2018;1–6. https://doi.org/10.1145/3170427.3188560. Publisher's Note Springer Nature remains neutral with regard to
77. Kim S, Goh J, and Jun S. The use of voice input to induce human jurisdictional claims in published maps and institutional affiliations.
communication with banking chatbots. In: Companion of the 2018
ACM/IEEE International Conference on Human-Robot Interac-
tion (HRI Companion ’18), 2018;151–152. https://doi.org/10.
1145/3173386.3176970.
SN Computer Science

A Systematic Review of Voice Assistant Usability: An ISO 9241-11 Approach

Uploaded by

Copyright:

Available Formats

A Systematic Review of Voice Assistant Usability: An ISO 9241-11 Approach

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Systematic Review of Voice Assistant Usability: An ISO 9241-11 Approach

Uploaded by

Copyright:

Available Formats

SN Computer Science (2022) 3:267

A Systematic Review of Voice Assistant Usability: An ISO 9241–11

Introduction communication used by these devices, rendering the graphic

Literature Review developing a government portal. In addition, the ISO

# Article name Summary Limitations Usability focus

studies looked at the vulnerabilities, associated

emotions towards artificially intelligent devices.

# Article name Summary Limitations Usability focus

the rapidly changing nature of COVID-19 infor-

type of approaches that are available for chatbot

Figure 1 highlights the graphic presentation of the selection

Fig. 1 Article selection process

Table 2 Quality assessment Checklist Definition

Table 3 List of compiled articles

1 An Exploration of Speech-Based Productivity Car Interface Effectiveness 2019

23 Designing Social Presence of Social Actors in Software Interface satisfaction 2003

ple (user attributes), voice (voice assistant attributes), task,

Fig. 4 Categories of independ-

lar personality like the user creates

Category definition Independent variable Instances Applications Environment

VA's response affected the user's

follow-up question affects user

Fig. 5 Technique used in our

Fig. 6 Usability measurement

(Fig. 7). The other usability measures we identified outside

57]. Moreover, embodiment type affects the user attitude as Discussion

15. Pal D, Zhang X, Siyal S. Prohibitive factors to the acceptance of

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.