A Systematic Review of Voice Assistant Usability: An ISO 9241-11 Approach
A Systematic Review of Voice Assistant Usability: An ISO 9241-11 Approach
A Systematic Review of Voice Assistant Usability: An ISO 9241-11 Approach
https://doi.org/10.1007/s42979-022-01172-3
ORIGINAL RESEARCH
Received: 6 January 2022 / Accepted: 20 April 2022 / Published online: 3 May 2022
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2022
Abstract
Voice assistants (VA) are an emerging technology that have become an essential tool of the twenty-first century. The VA ease
of access and use has resulted in high usability curiosity in voice assistants. Usability is an essential aspect of any emerging
technology, with every technology having a standardized usability measure. Despite the high acceptance rate on the use of
VA, to the best of our knowledge, not many studies were carried out on voice assistants’ usability. We reviewed studies that
used voice assistants for various tasks in this context. Our study highlighted the usability measures currently used for voice
assistants. Moreover, our study also highlighted the independent variables used and their context of use. We employed the
ISO 9241-11 framework as the measuring tool in our study. We highlighted voice assistant’s usability measures currently
used; both within the ISO 9241-11 framework, as well as outside of it to provide a comprehensive view. A range of diverse
independent variables are identified that were used to measure usability. We also specified that the independent variables
still not used to measure some usability experience. We currently concluded what was carried out on voice assistant usabil-
ity measurement and what research gaps were present. We also examined if the ISO 9241-11 framework can be used as a
standard measurement tool for voice assistants.
Keywords Voice assistants · Systematic literature review · Usability · User experience · ISO 9241-11 framework
SN Computer Science
Vol.:(0123456789)
267 Page 2 of 23 SN Computer Science (2022) 3:267
User Interface (VUI), and the heuristic Voice User Interface studies dealing with the usability aspects of VA’s. The fol-
(VUI), to evaluate the ease of use of the VA’s. The study lowing are the contributions of this literature review to the
affirmed both the two heuristics were appropriate. However, Human--Computer Interaction (HCI) community:
the study noted that one was less problematic to use than the
other [12]. A further study tested VUI heuristics to measure 1. Our work highlights the studies currently carried out on
VA efficacy [13]. However, a critical factor that prevents VA usability. This includes the independent and depend-
the VA from adopting the heuristic currently available is the ent variables currently used.
absence of a graphical user interface (GUI). Despite numer- 2. Our study highlights the factors that affect the voice
ous studies on heuristics, the level of satisfaction is still low assistants' acceptance and impact the user’s total expe-
[14]. Furthermore, heuristics cannot be used as a standard- rience.
ized approach because they are approximate strategies or 3. We identify and explain some attributes unique to only
empirical rules for decision-making and problem-solving voice assistants, such as machine voice.
that do not ensure a correct solution. According to a study by 4. We also highlight the evaluation techniques used in pre-
Murad [16], the absence of standardized usability guidelines vious studies to measure usability.
when developing VA interface presents a challenge in the 5. Finally, our study tries to compare the existing usability
development of an effective VA [15]. Another report from studies with the ISO 9241-11 framework. The decentral-
Budi & Leipheimer [17] also suggests that the usability of ized approach of the VA usability measurement makes
the VA’s requires improvements and standardization [16]. To it vague to understand if the ISO 9241-11 framework is
create a standard tool a globally recognized and well-known being adhered to whilst developing the usability metrics.
organization is critical in the process because it eliminates
bias and promotes neutrality [17]. The International Organi- We hope that our input will highlight the integration of
zation for Standardization (ISO) 9241-11 framework is one the current existing VA usability measures with the ISO
of the standard usability frameworks widely used for meas- 9241-11 framework. This will also verify whether the ISO
uring technology acceptance. 9241-11 framework can serve as a standard measure of usa-
According to the ISO 9241–11 framework, usability is bility in voice assistants. In conclusion, our study tries to
defined as “the degree to which a program may be utilized answer the following four research questions:
to achieve measurable objectives with effectiveness, effi-
ciency, and satisfaction in a specific context of usage” [18]. • RQ1: Can the ISO 9241–11 framework be used to meas-
ISO 9241-11 provides a framework for understanding and ure the usability of the VA’s?
applying the concept of usability in an interactive system • RQ2: What are the independent variables used when deal-
and environment [19]. The main advantage of using the ISO ing with the usability of VA’s?
standard is that industries and developers do not need to • RQ3: What current measures serve as the dependent vari-
build different design measurement tools. This standard is ables when evaluating the usability of VA’s?
intended to create compatibility with new and existing tech- • RQ4: What is the relationship between the independent
nologies, and also create trust [20]. Currently, the system and dependent variables?
developers do not have any standardized tool created spe-
cifically for the measurement of VA usability, consequently, The remaining work is structured as follows. The second
the measures are decentralized, causing confusion among section presents the related work. This highlights what previ-
developers. The lack of in-depth assessment of the current ous literature review studies had been carried out on voice
heuristics used in the VA design affects the trust and adapt- agents’ usability; furthermore, the section also highlights
ability of their users [15]. Other emerging technologies such the emergent technology that employed the ISO 9241-11
as virtual reality [21] and game design [22] have understood framework as a usability measuring tool. This is followed by
the importance of creating an acceptable standardized meas- the methodology section, which presents the inclusion and
urement tool when designing new interfaces. Therefore, VA exclusion criteria used together with the review protocol.
technology could also benefit significantly from the same Furthermore, the query created for the database search is
concept. As evident from the above discussion, there is little presented, and the database to be used is also selected. The
to no focus on VA standardization. fourth section presents the result and analysis. In this phase,
Our study presents a systematic literature review compris- the article used for this study is listed. Also, the research
ing works carried out on the usability of voice assistants. In questions are answered. The fifth section contains discussion
addition, we use the ISO 9241-11 framework as a standard- on the result analysis. This includes a more detailed explana-
ized measurement tool to analyze the findings from the stud- tion of the relationships between independent and dependent
ies we collected. We chose the ACM and IEEE databases for variables. Our insights and observations are included in this
the selection of our articles because both contain a variety of section as well.
SN Computer Science
SN Computer Science (2022) 3:267 Page 3 of 23 267
The ISO 9241-11 is a usability framework used to under- 1. Inclusion and exclusion criteria
stand usability in situations where interactive systems 2. Search query
are used and employed, which includes framework envi- 3. Database and article selection.
ronments, products, and services [39]. Nigel et al. [40] 4. Quality assessment.
conducted a study to revise the ISO 9241-11 framework
standard, which reiterates the importance of the frame-
work within the concept of usability. A number of studies Inclusion and Exclusion Criteria
have been conducted on various technologies using the
ISO 9241–11 framework as a tool to measure their usabil- The inclusion and exclusion criteria used in our study are
ity. This shows the diversified approach when using the developed for completeness and avoidance of bias. The cri-
framework. For instance, a study by Karima et al. (2016) teria we used for our study are:
proposed the use of ISO 9241-11 framework to measure
the usability of mobile applications running on multi- a. Studies that focus on VA, with voice being the primary
ple operating systems by developers, in which the study modality. In scenarios where the text or graphical user
identified display resolution and memory capacity as fac- interfaces are involved, they should not be the primary
tors that affect the usability of using mobile applications focus.
[41]. Another study used the ISO 9241-11 framework to b. Studies are only in the English language to avoid mis-
identify usability factors when developing e-government takes during translation from another language
systems [42]. This study focused on the general aspect c. The studies include at least one user and one voice assis-
of e-Government system development and concluded the tant to ensure that the focus is on usability, not system
framework could be used as a usability guideline when performance.
d. Study has a comprehensive conclusion.
SN Computer Science
Table 1 Current literature reviews
267
[23] Smart Home Voice Assistants: A Literature Survey The study explores the potential use vulnerabilities Privacy and vulnerability are not the primary focus Personal Smart Home use
of User Privacy and Security Vulnerabilities encountered while using the voice assistant. The in usability
Page 4 of 23
SN Computer Science
the use of voice assistant
[24] Intelligent personal assistants: A systematic litera- The natural language interfaces allow the human– The study did not conduct a thorough review of General use
ture review computer interaction by the translation of the what was done with respect to the usability of the
human intention in the controls of the devices, the voice assistant
analysis of the speech or the gestures of the user.
The article looked at the major trends, critical
areas and challenges of an intelligent personal
assistant. The study also proposed a taxonomy for
IPA classification. The method used the popula-
tion, intervention, comparison, outcome, and
context (PICOC) criteria
[25] Virtual Assistants for Learning: A Systematic The motivation, commitment and decreasing inter- The Study focused on voice assistants used only Education
Literature Review est of students in the learning process has always within an educational environment that motivates
existed, contributing to increased failures and users
dropouts. This can be attributed caused due to
the difficulties with time management. The grow-
ing number of students in higher education makes
it impossible to provide individual tutoring and
support to each student. This paper systematically
examines the use of virtual assistants in tertiary
education
It focuses on the technology which fuels them, their
characteristics and their impact in the learning
process
[26] Voice-Based Conversational Agents for the Preven- Chronic and mental diseases are increasingly The study only focused on voice assistants used in Health
tion and Management of Chronic and Mental prevalent throughout the world. As devices in the health environment alone
Health Conditions: Systematic Literature Review our everyday lives offer more and more voice-
based self-service, voice assistant can support the
prevention and management of these condi-
tions. This study highlights the current methods
used in the evaluation of health interventions for
the prevention and management of chronic and
mental health conditions delivered through voice
assistant
SN Computer Science (2022) 3:267
Table 1 (continued)
# Article name Summary Limitations Usability focus
[27] Tourists’ Attitudes toward the Use of Artificially This study examines tourist attitudes towards the The study was not on usability, to be specific, Tourism
Intelligent (AI) Devices in Tourism Service use of voice assistants in relatively more utilitar- but on adopting voice assistants in the tourism
Delivery: Moderating Role of Service Value ian or hedonic (air and hotel) tourism services. environment
Seeking The results of the study suggest that tourism
acceptance of VA is influenced by social influ-
ence, hedonistic motivation, anthropomorphism,
expectation of performance and exertion, and
SN Computer Science (2022) 3:267
SN Computer Science
Table 1 (continued)
267
[30] Evaluation of COVID-19 Information Provided by Digital voice assistants are widely used to search The study focused only on voice assistants used in Health
Digital Voice Assistants for health information during COVID-19. With Covid-19 related issues
Page 6 of 23
SN Computer Science
consumer needs and prevent disinformation. The
goal of this study is to evaluate the COVID-19
information provided by voice assistants in terms
of relevance, accuracy, usability and reliability.
The study found that information about this
pandemic is evolving rapidly and that users must
use good judgment when obtaining COVID-19
information from voice assistants
[31] The human side of human-Chatbot interaction: Over the last ten years there has been a growing The study focused only on chatbot with textual General Use
A systematic literature review of ten years of interest around text-based chatbot, software appli- modality. Moreover, chatbot use a Graphic User
research on text-based chatbot cations interacting with humans using natural Interface that is not present in voice assistants
written language. However, despite the enthusias-
tic market predictions, ‘conversing’ with this kind
of agents seems to raise issues that go beyond
their current technological limitations, directly
involving the human side of the interaction. This
study suggests a number of research opportunities
that could be explored over the next few years
[32] Voice in Human–Agent Interaction: A Survey Social robots, conversational agents, voice assis- The study did not use the ISO 9241–11 framework General Use
tants and other embodied AIs are increasingly as a reference in their measurement scale
a characteristic of daily life. The connection
between these different types of intelligent agents
is their ability to interact with people by voice.
The voice becomes an essential mode of embodi-
ment, communication and interaction between
IT operators and end users. This study presents
a meta-synthesis of the voice of agents in the
conception and experience of agents from a man-
centered point of view: voice assistant
[33] Usability of Chabot’s: A Systematic Mapping Study The use of chatbot has increased considerably in The study focused only on chatbot with textual General Use
recent years. As a result, it is essential to integrate modality. Moreover, chatbot use a Graphic User
conviviality into their development. For this rea- Interface that is not present in voice assistants
son, it is essential to integrate conviviality in their
development. The study identifies the state of the
art in the conviviality of chatbot and the applied
techniques of human–computer interaction, to
analyze how to assess the conviviality of chatbot
SN Computer Science (2022) 3:267
Table 1 (continued)
# Article name Summary Limitations Usability focus
[34] A Literature Review On Chatbot In Healthcare The study highlighted Chabot used in the The study deals with chatbot with textual modality. health
Domain healthcare environment. Also, it compares the Also, the study deals with chatbot used only in a
techniques such as NLU, NLG, and ML used in healthy environment
chatbot development
[35] Review of Chatbot Design Techniques The study reviewed the techniques and factors The study focuses on chatbot with textual modality, Commerce
considered when designing a chatbot.Also it which is different from a voice assistant
highlighted how chatbot worked and what are the
SN Computer Science (2022) 3:267
SN Computer Science
267 Page 8 of 23 SN Computer Science (2022) 3:267
e. Released between 2000 and 2021, because during this search query returned 340 results from the ACM database
period the vocal assistants started to gain notable popularity and 280 results from the IEEE database. 720 items in both
databases were checked for duplication and 165 documents
The exclusion criteria are: (23%) were found to be duplicated and hence removed.
Additionally, more items were filtered by title and abstract.
a. Studies with poor research design, where the study's We utilized keyword match to search the title; however, the
purpose is not clear are excluded. abstract was read to identify the eligibility criteria. In addi-
b. White papers, posters, and academic thesis are excluded. tion, 399 documents (72%) were removed because they did
not meet the eligibility criteria. Finally, 121 documents were
Search Query removed that were not consistent with the research objec-
tives of our study. At the end of the screening process 29
We created the search query for our study using keywords articles (19%) were finally included in this literature review.
arranged to search the relevant databases. We went through pre-
vious studies to find the most relevant search keyword to find what Quality Assessment
is commonly used in usability studies. After numerous debates
among the researchers and seeking two HCI expert's opinion, we The selected items presented in Table 2 are used for assess-
chose the following set of keywords: usability, user experience, ing the quality of the selected articles. The process was
voice assistants, personal assistants, conversational agents, Google deployed to ensure the reported contents fit into our research.
Assistant, Alexa, and Siri. We connected the keywords with logi- The sections collected from articles such as the methodol-
cal operators (AND and OR) to yield accurate results. The final ogy used, analysis done, and the context of use within each
search string used was (“usability” OR “user experience “) AND article were vital to our study. Each question is a three-point
(“voice assistants” OR “personal assistants” OR “conversational scale: “Yes” is scored as 1 point, which means the question
Agents” OR “Google Assistant” OR” Alexa” OR “Siri”). The is fully answerable. “Partial” is scored as 0.5, which means
search was limited to the abstract and title of the study. the question is vaguely answered, and “NO “is scored as
0, which means it is not answered at all. All the 29 sets of
Database and Article Selection finally included articles passed the quality assessment phase.
SN Computer Science
SN Computer Science (2022) 3:267 Page 9 of 23 267
compiled articles. Moreover, we identified the usability households have an intelligent smart speaker, and pro-
focus of each study. jected to reach 75% by 2025 [84]. Use of humanoids is
also popular because usability measures such as anthro-
pomorphism are essential for voice assistant usability
Voice Assistant Usability Timeline [85]. Furthermore, Fig. 3 shows that only a few studies
were done on car interface voice assistants. Car interfaces
We grouped the collected research into three categories, are vocal assistants that act as intermediaries between
each representing a range of time frames (Fig. 2). The the driver and the car. The VA car interface allows driv-
categorization is based on voice assistant period break- ers to access car information and also be able to perform
throughs. The first category is from 2000 to 2006, which the task without losing focus on driving. The fourth type
was the year of social media and camera phones, also of software interface refers to a voice assistant software
known as the year of the Y2K bug in telecommunications. embedded inside smartphones or computers. The studies
During these years, conversational agents started to get we have collected have used either the commercialized
noticed with the introduction of the inventions such as the form of the software interface, such as Alexa and Siri,
Honda’s Advanced Step in Innovative Mobility (ASIMO) while others have developed new voice interfaces that are
humanoid robot [80]. The second category ranges from easily accessible to users due to the adoption of smart-
2007 to 2014. During these years technological advance- phones and computers assistants using programming codes
ments got users more exposed to voice assistants through and skills. Nevertheless, both are in the forms of different
embedding them into smartphones and computers. For software agents.
instance, Apple first introduced SIRI in 2011 [81], and
Microsoft introduced Cortana in 2014. The last category
ranges from 2015 to 2021. This was when the massive Component of ISO 9241‑11 Framework
adoption of voice assistants took place, making it an all-
time high. The ISO 9241-11 framework highlights two components, the
Based on the year of publication of our selected context of use and usability measure [18]. We concentrate on
articles, Fig. 2 clearly shows that the study on VA’s both components to highlight any correlations between usa-
has expanded significantly over the last six years bility metrics and the context of use in the selected articles.
(2014–2021). This can be attributed to the invention of a The context of use consists of the different independent vari-
smart speaker and phone with built-in voice agents [82]. ables along with the techniques used for analyzing them.
Another reason for VA popularity is the COVID -19 out- Likewise, the usability measure represents the dependent
break that has given a fresh impetus towards touchless variables, i.e., the effect that the independent variables have
interaction technologies like voice [83]. on the overall experience of the users. Accordingly, the
analysis is presented in a bi-dimensional manner in the fol-
Different Embodiment Types of VA’s lowing sections.
Smart speakers are the mostly used embodiment of VA’s Context of Use
used in our selected articles. This is due to the current
popularity of commercial smart speakers such as Alexa, Independent Variable We split the context of use into an
HomePod, etc. A 2019 study showed that 35% of US independent variable and the techniques used. The inde-
SN Computer Science
267 Page 10 of 23 SN Computer Science (2022) 3:267
SN Computer Science
SN Computer Science (2022) 3:267 Page 11 of 23 267
Table 3 (continued)
# Article name Voice assistant type Usability measure Years
SN Computer Science
267 Page 12 of 23 SN Computer Science (2022) 3:267
of controlled experiments is the absence of external valid- with which users achieve specified goals.”, Whereas “Effi-
ity. The results might not be the same when applied in ciency is the resources expended concerning accuracy and
real-world settings. For instance, the simulation experi- completeness in which users achieve goals” and “satisfac-
ment on cars is a controlled environment, a driver has no tion is the freedom from discomfort and positive attitudes
control over the domain in real life. The usability experi- towards the use of the product” [18].
ence of the driver might be different in natural settings and In numerous studies, the usability measures used were
that might sometimes prove fatal. clearly outside the scope of the ISO 9241-11 framework.
In total, we identified three additional usability categories
Techniques Used We identified seven techniques that attitude, machine voice (anthropomorphism), and cognitive
researchers have used as shown in Fig. 5. The quantitative load. The graphical representation of the different usabil-
experiments are the most used and the oldest technique ity measures identified in this study is presented in Figs. 6
used on voice assistants based on our data collected. The and 7. Futhermore, the figures also highlights the percent-
quantitative method is sometimes used as a standalone age of studies that used the mentioned usability measures
experiment and sometimes with other techniques [54]. It is in the ISO 9241-11 framework and those that are outside
worthy of notice that cars simulation experiments involv- the framework. Based on our compiled result, the user sat-
ing VA’s were first used in 2000. Other experiments on isfaction and effectiveness are the earliest usability meas-
human communication with self-driving cars have been ures used when measuring VA’s usability. Some studies
carried out since 1990’s. making it one of the oldest tech- used performance and productivity as subthemes to meas-
niques for usability measurement. More accurate tech- ure effectiveness [62]. The measure of usability has been
nique was introduced later, such as the interaction design. carried out both subjectively and objectively. For instance,
The interaction design employed by studies such as [61] studies have measured the VA effectiveness by subjective
provides a real-time experiment scenario. This avoids the means by using quantitative methods such as questionnaire
drawback such as bias when using quantitative methods. tools [72]. In contrast, other studies have used objective
Factorial design studies are majorly used by studies that methods such as average completed interaction [69]. Mul-
compare two or more entities in a case study [55]. They tiple usability measures are sometimes applied in the same
are utilized mainly by studies using two or more inde- research; for instance some studies measured effectiveness
pendent variables together. alongside efficiency and satisfaction [66, 70]. Learnability,
optimization, and ease of use have been used as subthemes
Usability Measure (Dependent Variable) to measure efficiency. Interactive design is the most effec-
tive experiment that provides real-time results employed [56,
This subsection of our study focuses on the usability meas- 79]. The ISO 9241-11 framework works well with effec-
urement of our research. Moreover, the findings are used to tiveness, efficiency, and satisfaction; however, the users
answer RQ1 and RQ3. The ISO 9241-11 framework grouped have more expectations from the voice assistant with the
usability measures into three categories; effectiveness, effi- recent advancement of VA capabilities. Our compiled result
ciency, and satisfaction. According to the ISO 9241-11 showed that more than half of the studies are not carried out
framework, “effectiveness is the accuracy and completeness in accordance with the standard ISO 9241-11 framework
SN Computer Science
Table 4 Independent variables and their categorization
Category definition Independent variable Instances Applications Environment
Voice Voice personalities (Energetic vs Subdued), (Introvert A study Paired the driver's emotions Simulation Experiment, Controlled
The voice category comprised of and extrovert) with that of the Car Voice Emotion Environment,
independent variables that are state (Energetic and Subdued) to
associated with the voice assis- test the effectiveness of similarity
tants, these are attributes that the between voice and user personality
voice assistants possess) [68]. Another study showed that a
voice personality that uses a simi-
SN Computer Science (2022) 3:267
SN Computer Science
Table 4 (continued)
267
Query expression Abuse (Insult, Threat, Swearing) A study instructed the user to Controlled Environment
insult the voice assistants while
communicating with it, and the
Page 14 of 23
SN Computer Science
Experience UX metric, Self-Efficacy The study Measured the user face Controlled Environment, survey
validity and construct validity by
correlating UX scores of question-
naires with each other. Another
study shows that Participant
self-efficacy and experience affect
the trust, privacy and language
performance of the Voice Assistant
[74]
Voice accent American Accent vs Swedish Participants tend to trust information Controlled Environment, Free real
Accent, Native English speaker vs with a similar accent, then more environment, mixed environment
non-English speaker knowledgeable content, English
native speakers do exhaust more
mental models when interacting
with voice assistants [52, 68, 72]
Task characteristic Modality Voice mode, Textual mode, VA A study used modality to test the emotional expression design experi-
The Task characteristic Comprised Facial Expression mode. (Smiley) social presence of the VA. The ment interactive task, controlled
of independent variables that are Mixed Interface study shows participants feel a Environment,
associated with tasks that it’s strong social presence when tex-
expected the user to carry out tual modality personality matches
during the interaction, this also the voice personality. Another
include the modality of the task study showed that nonverbal
emotional expressions such as
Text box movement and VA Facial
Expression mode (Smiley) affect
user engagement [57]
Context Interactive Task, Drawing Task, A study used the speech to text as a emotional expression design experi-
Executable Task, Driving simula- task on users during driving. The ment interactive task, controlled
tion task, auditory Task Control- study measured driver engagement Environment, free real life environ-
ling device Volume, audio speech and concentration during driving. ment, simulation
to text [52]Another study used the game
theory concept on the users and
asked the users to trust the VA in
an investment scheme, where the
users have a different opinion on
what VA to trust [76]
SN Computer Science (2022) 3:267
Table 4 (continued)
Category definition Independent variable Instances Applications Environment
Conversational Style Response type Empathetic (Avoidance vs Empathy The VA response affects the user Controlled Environment,
This is the nature of the conversa- vs Counterattack) usability experience, and A
tion from either the user during Clarifying Query (No modification Study showed that When VA are
query or the response of the voice vs direct Modification vs nega- insulted, their response type affects
assistants tively clarified) the participant's emotional engage-
Conversational Fillers (“um”,huh, ment and attitude [69]. Another
uh) study showed that when VA has
more information on a query, the
SN Computer Science (2022) 3:267
SN Computer Science
267 Page 16 of 23 SN Computer Science (2022) 3:267
SN Computer Science
SN Computer Science (2022) 3:267 Page 17 of 23 267
SN Computer Science
267
Page 18 of 23
SN Computer Science
Table 5 Relationship between independent variables and ISO 9241–11 framework measurement
Dependent variables Effectivity Efficiency Satisfaction
Independent variables Productivity Performance Value Learnability Optimization Ease of use Feasibility Decision User experi- Continued Conformity
making ence use
Voice Assistants
Personalities x x x x x
Gender x x x x x
Accent x x x x x
People
Gender x x x x
Personality x x x
Query expression
Experience x x x
Accent x x x x x x
Task characteristics
Modility x x x x x x x
Context x x
Communication type
Response type x x x x x
Conversational type x x x x x x
Anthropomorphism
Embodiement type x x x x x x
Humanoid/robott x x
Smart Speaker, x x x x x x
Robot,Anthropormorphic
Robot
SN Computer Science (2022) 3:267
Table 6 Relationship between independent variables and non- ISO 9241–11 framework measurement
Dependent vari- Attitude Machine voice Cognitive load
ables
SN Computer Science (2022) 3:267
Independent vari- Trust Likeability Acceptance Perceived intel- Perceived human- Social presence Mental workload Attention
ables ligence ness
Voice assistant
Personalities x x x x
Gender x x x x x
Accent x x x x x x
People
Gender x x x
Personality x x x x x
Query expression x x
Experience x x x x
Accent x x x x x x
Task Character-
istics
Modality x x x x x x x
Context x x x x
Communication
Type
Response type x x x x x x
Conversational x x x x x x
type
Anthropomorphism
Embodiment type x x x x x x x
Humanoid/robot x x x x x
Smart speaker, x x x x
robot, anthropo-
morphic robot
Page 19 of 23 267
SN Computer Science
267 Page 20 of 23 SN Computer Science (2022) 3:267
to cultivate user trust. Noticeably subjective methods were Future Works and Limitation
widely employed when measuring the user attitudes; even
though subjective measures often relate to the variables they One limitation in our study was using a few databases as
are intended to capture; however, they are also affected by our articles source; in future studies, we intend to add more
cognitive biases. journal databases such as Scopus, and Taylor and Francis.
The ISO 9241-11 framework is an effective tool when The majority of the experiment studies we collected was
measuring effectiveness, efficiency, and satisfaction. How- conducted in a controlled environment; future studies will
ever, it is not applicable when measuring usability’s, such as focus on usability measures and independent variables, that
attitude, machine voice, and mental load. These are all meas- are used in natural settings; furthermore, the results can be
urements that are uniquely associated with voice assistants. compared together More studies should be carried out on
Therefore, the ISO 9241-11 framework could be expanded objective techniques, also how they could cooperate with
to include such usability aspects. subjective techniques. This is vital because, with the rise of
user expectations of voice assistants, it will be essential to
Technique Employed understand how techniques complement each other in each
usability measurement.
The factorial design adapts well when used in a matched
subject design experiments [56]. Based on the studies col-
lected, machine learning is not well used as an analytic tool Conclusion
in usability. This could be attributed to the technical aspects
of machine learning and it is still relatively a new field. How- Our study aimed to understand what is currently employed
ever, with machine learning third-party tools more analysis for measuring voice assistant usability, and we identified the
will be carried out. Wizard of Oz, and interactive design different independent variables, dependent variables, and the
started gaining popularity in 2015–2021. Moreover, the techniques used. Furthermore, we also focused on using the
Wizard of Oz and interactive techniques are more effective ISO 9241-11 framework to measure the usability of voices
when using independent variables such as anthropomorphic assistants. Our study classified five independent variable
cues. The anthropomorphic cue independent variables is classes used for measuring the dependent variables. These
used with Wizard of Oz. techniques and interaction design separate classes were categorized based on the similarities
more than any other techniques. This could be recognized to between the member groups. Also, our study used the three
the importance of using objective methods to avoid biased usability measures in the ISO 9241-11 framework in con-
human responses. Furthermore, “machine voice” is a fairly junction with the other three to serve as the dependable vari-
popular usability measure. This could be attributed to the ables. We uncovered that voice assistants such as car inter-
VA developers trying to give the VA a more human and face speakers were not studied enough, and currently, smart
intelligent attributes. The more users perceive the machine speakers have the most focus. Dependent variables such as
voice as intelligent and humanlike, the more they trust and machine voice (anthropomorphism) and attitude recently
adopt it. More objective technique methods should be cre- have more concentration than the old usability measures,
ated and used on the independent variables when measuring such as effectiveness. We also uncovered that usability is
machine voice. Subjective techniques such as Quantitative dependent on the context of use, such as the same independ-
methods are easy to use and straightforward. However, they ent variables could be used in different usability measures.
can produce biased results. Our study highlights the relationship between the independ-
Interactive design experiments are the most commonly ent and dependent variables used by other studies. In conclu-
used technique employed to measure the usability. How- sion, our study used the ISO 9241-11 to analyse usability.
ever, the interaction depends on voice modality, making it We also highlight what has been carried out on VA’s usabil-
different from the traditional interaction design that uses ity and what gaps are left. Moreover, we concluded even
visual cues as part of its essential components. Moreover, though there is a lot of usability measurement carried out,
interaction design also triggers an emotional response, there are still many aspects that have not been researched.
which makes it effective when measuring user attitude. The Furthermore, the current ISO 9241-11 framework is not suit-
absence of visual elements in interactive design used might able for measuring the recent advancement of VA because
debatably defeat the purpose of clear communication. A new the user needs and expectation have changed with the rise
standard of interaction design uniquely for voice modality of technology. Using the ISO 9241-11 framework will cre-
should be done. ate ambiguity in explaining some usability measures such
as machine voice, attitude and cognitive load. However, it
has the potential to be a foundation for future VA usability
frameworks.
SN Computer Science
SN Computer Science (2022) 3:267 Page 21 of 23 267
SN Computer Science
267 Page 22 of 23 SN Computer Science (2022) 3:267
35. Bhirud N, Tataale S, Randive S, Nahar S. A literature review Computing Systems (CHI EA ’19). 2019; 1–6. https://doi.org/10.
on chatbots in healthcare domain. Int J Sci Technol Res. 1145/3290607.3312913.
2019;8(7):225–31. 55. Yu Q, Nguyen T, Prakkamakul S and Salehi N. “I almost fell
36. Ahmad NA, Che MH, Zainal A, Abd Rauf MF, Adnan Z. in love with a machine”: speaking with computers affects self-
Review of chatbots design techniques. Int J Comput Appl. disclosure. In: Extended Abstracts of the 2019 CHI Conference
2018;181(8):7–10. on Human Factors in Computing Systems (CHI EA ’19). 2019;
37. Gentner T, Neitzel T, Schulze J and Buettner R. A Systematic lit- pp. 1–6. https://doi.org/10.1145/3290607.3312918
erature review of medical chatbot research from a behavior change 56. Kiesel J, Bahrami A, Stein B, Anand A, and Hagen M. Clarify-
perspective. In: 2020 IEEE 44th annual computers, software, and ing false memories in voice-based search. In: Proceedings of the
applications conference (COMPSAC). IEEE; 2020, pp. 735–740. 2019 Conference on Human Information Interaction and Retrieval
38. Cunningham-Nelson S, Boles W, Trouton L and Margerison E. A (CHIIR ’19). 2019; 331–335. https://doi.org/10.1145/3295750.
review of chatbots in education: practical steps forward. In: 30th 3298961.
Annual Conference for the Australasian Association for Engi- 57. Kontogiorgos D, Pereira A, Andersson O, Koivisto M, Rabal EG,
neering Education (AAEE 2019): Educators Becoming Agents Vartiainen V and Gustafson J. The effects of anthropomorphism
of Change: Innovate, Integrate, Motivate. Engineers Australia; and non-verbal social behavior in virtual assistants. In: Proceed-
2019, pp. 299–306. ings of the 19th ACM International Conference on Intelligent Vir-
39. Van Pinxteren MM, Pluymaekers M, Lemmink JG. Human-like tual Agents (IVA ’19). 2019; 133–140. https://doi.org/10.1145/
communication in conversational agents: a literature review and 3308532.3329466
research agenda. J Serv Manag. 2020;31:203–25. 58. Hoegen R, Aneja D, McDuff D and Czerwinski M. An end-to-end
40. Weichbroth P. Usability attributes revisited: a time-framed knowl- conversational style matching agent. In: Proceedings of the 19th
edge map. In: 2018 Federated Conference on Computer Science ACM International Conference on Intelligent Virtual Agents (IVA
and Information Systems (FedCSIS). IEEE; 2018, pp. 1005–1008. ’19). 2019; 111–118. https://doi.org/10.1145/3308532.3329473
41. Bevan N, Carter J, Earthy J, Geis T, Harker S. New ISO stand- 59. Luo Y, Lee B and Choe EK. TandemTrack: shaping consistent
ards for usability, usability reports and usability measures. In: exercise experience by complementing a mobile app with a smart
International conference on human-computer interaction. Cham: speaker. In: Proceedings of the 2020 CHI Conference on Human
Springer; 2016. p. 268–78. Factors in Computing Systems (CHI ’20). 2020; 1–13. https://d oi.
42. Moumane K, Idri A, Abran A. Usability evaluation of mobile org/10.1145/3313831.3376616
applications using ISO 9241 and ISO 25062 standards. Springer- 60. Doyle PR, Edwards J, Dumbleton O, Clark L and Cowan BR.
plus. 2016;5(1):1–15. Mapping perceptions of humanness in intelligent personal assis-
43. Yahya H, Razali R. A usability-based framework for electronic tant interaction. In: Proceedings of the 21st International Confer-
government systems development. ARPN J Eng Appl Sci. ence on Human-Computer Interaction with Mobile Devices and
2015;10(20):9414–23. Services (MobileHCI ’19). 2019. https://doi.org/10.1145/33382
44. Alva ME, Ch THS, López B. Comparison of methods and existing 86.3340116.
tools for the measurement of usability in the web. In: International 61. Jaber R, McMillan D, Belenguer JS and Brown B. Patterns of
conference on web engineering. Berlin: Springer; 2003. p. 386–9. gaze in speech agent interaction. In: Proceedings of the 1st Inter-
45. He X, Persson H, Östman A. Geoportal usability evaluation. Int J national Conference on Conversational User Interfaces - CUI ’19
Spatial Data Infrastruct Res. 2012;7:88–106. (the 1st International Conference). 2019; 1–10. https://doi.org/10.
46. Dietlein CS, Bock OL. Development of a usability scale based 1145/3342775.3342791.
on the three ISO 9241–11 categories “effectiveness, ”effi- 62. Bortoli M, Furini M, Mirri S, Montangero M and Prandi C. Con-
cacy” and “satisfaction”: a technical note. Accred Qual Assur. versational interfaces for a smart campus: a case study. In: Pro-
2019;24(3):181–9. ceedings of the international conference on advanced visual inter-
47. Nik Ahmad NA and Hasni NS. ISO 9241–11 and SUS measure- faces (AVI ’20). 2020. https://doi.org/10.1145/3399715.3399914.
ment for usability assessment of dropshipping sales management 63. Wu Y, Edwards Y, Cooney O, Bleakley A, Doyle PR, Clark L,
application. In: 2021 10th International Conference on Software Rough D and Cowan BR. Mental workload and language pro-
and Computer Applications. 2021; pp. 70–74. duction in non-native speaker IPA interaction. In: Proceedings of
48. Kitchenham B. Procedures f or perf orming systematic reviews. the 2nd Conference on Conversational User Interfaces (CUI ’20).
Keele Univ. 2004;33(2004):1–26. 2020. https://doi.org/10.1145/3405755.3406118
49. Seaborn K, Miyake NP, Pennefather P, Otake-Matsuura M. 64. Brüggemeier B, Breiter M, Kurz M and Schiwy J. User experi-
Voice in human–agent interaction: a survey. ACM Comput Surv ence of Alexa when controlling music: comparison of face and
(CSUR). 2021;54(4):1–43. construct validity of four questionnaires. In: Proceedings of the
50. Al-Qaysi N, Mohamad-Nordin N, Al-Emran M. Employing the 2nd conference on conversational user interfaces (CUI ’20). 2020.
technology acceptance model in social media: a systematic review. https://doi.org/10.1145/3405755.3406122
Educ Inf Technol. 2020;25(6):4961–5002. 65. Machine body language: expressing a smart speaker’s activity
51. Kitchenham B and Charters S. Guidelines f or performing system- with intelligible physical motion. 57
atic literature reviews in software engineering; 2007. 66. Bartneck C, Kulić D, Croft E, Zoghbi S. Measurement instru-
52. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, ments for the anthropomorphism, animacy, likeability, perceived
Mulrow CD CD, The PRISMA, et al. statement: an updated guide- intelligence, and perceived safety of robots. Int J Soc Robot.
line f or reporting sy stematic rev iews. BMJ. 2020;2021(372): 2009;1(1):71–81. https://doi.org/10.1007/s12369-008-0001-3A.
n71. https://doi.org/10.1136/bmj.n71. 67. Braun M, Mainz A, Chadowitz R, Pfleging B and Alt F. At your
53. Martelaro N, Teevan J and Iqbal ST. An exploration of speech- service: designing voice assistant personalities to improve auto-
based productivity support in the car. In: Proceedings of the 2019 motive user interfaces. In: Proceedings of the 2019 CHI Con-
CHI conference on human factors in computing systems. 2019; ference on Human Factors in Computing Systems (CHI ’19),
pp. 1–12 2019;40:1–40:11. https://doi.org/10.1145/3290605.3300270
54. Jeong Y, Lee J and Kang Y. Exploring effects of conversational 68. Burbach L, Halbach P, Plettenberg N, Nakayama J, Ziefle M
fillers on user perception of conversational agents. In: Extended and Valdez AC. “Hey, Siri”, “Ok, Google”, “Alexa”. Accept-
Abstracts of the 2019 CHI Conference on Human Factors in ance-relevant factors of virtual voice-assistants. In 2019: IEEE
SN Computer Science
SN Computer Science (2022) 3:267 Page 23 of 23 267
International Professional Communication Conference (Pro- 78. Shamekhi A, Liao QV, Wang D, Bellamy RKE and Erickson T.
Comm) (ProComm ’19), 2019;101–111. https://doi.org/10.1109/ Face value? Exploring the effects of embodiment for a group
ProComm.2019.00025. facilitation agent. In: Proceedings of the 2018 CHI Conference
69. Pal D, Arpnikanondt C, Funilkul S, and Varadarajan V. User expe- on Human Factors in Computing Systems (CHI ’18),2018;391:1–
rience with smart voice assistants: The accent perspective. In 2019 391:13. https://doi.org/10.1145/3173574.3173965
10th International Conference on Computing, Communication and 79. Torre I, Goslin J, White L and Zanatto D. Trust in artificial voices:
Networking Technologies (ICCCNT ’19), 2019;1–6. https://doi. a “congruency effect” of first impressions and behavioral experi-
org/10.1109/ICCCNT45670.2019.8944754. ence. In Proceedings of the 2018 Technology, Mind, and Society
70. Chin H, Molefi L, and Yi Y. Empathy is all you need: How a con- Conference (TechMindSociety ’18), 2018. Article No. 40. https://
versational agent should respond to verbal abuse. In: Proceedings doi.org/10.1145/3183654.3183691.
of the 2020 CHI Conference on Human Factors in Computing 80. Yarosh S, Thompson S, Watson K, Chase A, Senthilkumar A,
Systems (CHI ’20), 2020; 1–13. https://d oi.o rg/1 0.1 145/3 31383 1. Yuan Y and Brush AJB. Children asking questions: Speech inter-
3376461. face reformulations and personification preferences. In: Proceed-
71. Crowell CR, Villanoy M, Scheutzz M and Schermerhornz P. Gen- ings of the 17th ACM Conference on Interaction Design and Chil-
dered voice and robot entities: Perceptions and reactions of male dren (IDC ’18), 2018;300–12. https://doi.org/10.1145/3202185.
and female subjects. In: Proceedings of the 2009 IEEE/RSJ Inter- 3202207.
national Conference on Intelligent Robots and Systems (IROS 81. Stucker BE, Wicker R. Direct digital manufacturing of integrated
2009), 2009; 3735–3741. https://doi.org/10.1109/IROS.2009. naval systems using ultrasonic consolidation, support material
5354204 deposition and direct write technologies. UTAH STATE UNIV
72. Lee S, Cho M and Lee S. What if conversational agents became LOGAN; 2012.
invisible? Comparing users’ mental models according to physi- 82. Kaplan A, Haenlein M. Siri, Siri, in my hand: Who’s the fairest
cal entity of AI speaker. In: Proc. ACM Interact. Mob. Wearable in the land? On the interpretations, illustrations, and implications
Ubiquitous Technol. 2020; 4, 3. https://doi.org/10.1145/3411840 of artificial intelligence. Bus Horiz. 2019;62(1):15–25.
73. Dahlbäck N, Wang QY, Nass C and Alwin J. Similarity is more 83. Humphry J and Chesher C. Preparing for smart voice assis-
important than expertise: Accent effects in speech interfaces. In tants: Cultural histories and media innovations. New media Soc.
Proceedings of the SIGCHI Conference on Human Factors in 2020;1461444820923679
Computing Systems (CHI ’07), 2007; 1553–1556. https://doi. 84. Moar JS. Cov id-19 and the Voice Assistants Market. Juniper
org/10.1145/1240624.1240859 Research. Retrieved Nov ember 25, 2021, f rom https://www.
74. Lee EJ, Nass C, and Brave S. Can computer-generated speech juniperresearch.com/blog/august-2021/covid-19-and-the-voice-
have gender?: An experimental test of gender stereotype. In Pro- assistants-market
ceedings of the CHI’00 Extended Abstracts on Human factors in 85. Vailshery LS. Topic: Smart speakers. Statista. Retrieved Novem-
Computing Systems (CHI EA ’00), 2000; 289–290. https://doi. ber 25, 2021, from https://www.statista.com/topics/4748/smart-
org/10.1145/633292.633461 speakers/#:~:text=As%20of%202019%20an%20estimated,incre
75. Nass C, Jonsson I-M, Harris H, Reaves B, Endo J, Brave S and ase%20to%20around%2075%20percent
Takayama L. Improving automotive safety by pairing driver emo- 86. Pal D, Vanijja V, Zhang X, Thapliyal H. Exploring the antecedents
tion and car voice emotion. In: CHI ’05 Extended Abstracts on of consumer electronics IoT devices purchase decision: a mixed
Human Factors in Computing Systems (CHI EA ’05), 2005;1973– methods study. IEEE Trans Consum Electron. 2021;67(4):305–18.
6. https://doi.org/10.1145/1056808.1057070. https://doi.org/10.1109/TCE.2021.3115847.
76. Shi Y, Yan X, Ma X, Lou Y and Cao N. Designing emotional 87. Pal D, Arpnikanondt C, Razzaque MA, Funilkul S. To trust or
expressions of conversational states for voice assistants: Modal- not-trust: privacy issues with voice assistants. IT Professional.
ity and engagement. In: Extended Abstracts of the 2018 CHI 2020;22(5):46–53. https://doi.org/10.1109/MITP.2019.2958914.
Conference on Human Factors in Computing Systems (CHI ’18),
2018;1–6. https://doi.org/10.1145/3170427.3188560. Publisher's Note Springer Nature remains neutral with regard to
77. Kim S, Goh J, and Jun S. The use of voice input to induce human jurisdictional claims in published maps and institutional affiliations.
communication with banking chatbots. In: Companion of the 2018
ACM/IEEE International Conference on Human-Robot Interac-
tion (HRI Companion ’18), 2018;151–152. https://doi.org/10.
1145/3173386.3176970.
SN Computer Science
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: