Visual Informatics
Visual Informatics
Visual Informatics
journal homepage: www.elsevier.com/locate/visinf
Research Article
article info a b s t r a c t
Article history: With the incredible growth of the scale and complexity of datasets, creating proper visualizations
Received 10 January 2024 for users becomes more and more challenging in large datasets. Though several visualization rec-
Received in revised form 7 June 2024 ommendation systems have been proposed, so far, the lack of practical engineering inputs is still a
Accepted 11 June 2024
major concern regarding the usage of visualization recommendations in the industry. In this paper, we
Available online 13 June 2024
proposed AVA, an open-sourced web-based framework for Automated Visual Analytics. AVA contains
Keywords: both empiric-driven and insight-driven visualization recommendation methods to meet the demands
Automated visual analytics of creating aesthetic visualizations and understanding expressible insights respectively. The code is
Visualization recommendation available at https://github.com/antvis/AVA.
Insight mining © 2024 The Authors. Published by Elsevier B.V. on behalf of Zhejiang University and Zhejiang University
Press Co. Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
https://doi.org/10.1016/j.visinf.2024.06.002
2468-502X/© 2024 The Authors. Published by Elsevier B.V. on behalf of Zhejiang University and Zhejiang University Press Co. Ltd. This is an open access article under the
CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
J. Wang, X. Li, C. Li et al. Visual Informatics 8 (2024) 106–114
In particular, these requirements are drawn from real-world com- the lens of causality, offering qualitative and quantitative ex-
plex visual development scenarios. This motivates us to propose planations that significantly improve human understanding and
AVA, which mainly integrates two modules, namely, the base confidence in data analysis outcomes. Similarly, InsightPilot (Ma
module and the recommendation module. The base module con- et al., 2023a), an LLMs (Large Language Models)-based system,
sists of some basic syntax and rules to help with chart recommen- simplifies the data exploration process by issuing a sequence
dation. The recommendation module includes the main process of of analysis actions to explore the data and generate insights
data visualization, such as data pre-processing, empirical-driven from natural language questions. In addition, the industry has
recommendation, insight-driven recommendation, and narrative developed systems with similar automatic insight generation fea-
data interpretation. We demonstrate the effectiveness and usabil- tures like Microsoft Power BI,1 Google Sheets,2 and Amazon
ity of AVA through case studies and a comparative analysis based QuickSight.3
on real-world scenarios. The main contributions of this paper are Besides, novel visualization and visual analytics approaches
as follows: utilizing various statistical and modeling techniques have also
been proposed to guide efficient insight exploration under ana-
• An engineering framework for intelligent visual analytics lytics contexts (Zhou et al., 2022a). For example, characterizing
with low learning and customization costs and suitable for empirical and statistical features as guidance (Ceneda et al., 2016;
diverse business scenarios. Zhou et al., 2021; Kale et al., 2023), contextualizing data to
• A visual recommendation pipeline that incorporates mitigate bias and paradox (Armstrong and Wattenberg, 2014;
empiric- and insight-based approaches and supports narra- Gotz et al., 2016; Dimara et al., 2018), aggregating datasets to
tive data interpretation. support meaningful interpretations (Xiong et al., 2019; Borland
et al., 2024; Wang et al., 2024), as well as analyzing insights under
The rest of this paper is organized as follows. Section 2 reviews
domain-specific expertise and usage scenarios such as supporting
related work. Section 3 proposes the design requirements. Sec-
event sequence analysis (Gotz and Stavropoulos, 2014; Guo et al.,
tion 4 introduces the architecture of the AVA framework. We
2017; Jin et al., 2020), privacy and fraud analysis (Zhou et al.,
present two case studies and a comparative analysis to validate
2022b, 2023; Nanayakkara et al., 2024), and geospatial insight
our work in Section 5, discuss in Section 6, and conclude in
exploration (Wood et al., 2007; Chen et al., 2017; Zhou et al.,
Section 7.
2018). These methods underscore the ongoing focus of insight
extraction to support visualizations.
2. Related work However, despite the rich set of methods provided by the
aforementioned works for insight extraction, there is a notable
2.1. Insight extraction gap in research and front-end tooling when it comes to determin-
ing the most effective visual representations for different types
Insight extraction focuses on extracting useful data facts, also of insights. Users may have different preferences across different
known as insights, from the data (Battle and Ottley, 2023; North, types of insights and tasks in visualization (Quadri and Rosen,
2006; Gotz and Zhou, 2009). Insights can be classified into various 2021; Quadri et al., 2024), current practices may not fully exploit
types, such as outstanding, dominance, top-k, outlier, increase the potential of visualization to enhance the interpretability and
trend, etc Lin et al. (2018), Tang et al. (2017) and Ma et al. impact of the extracted insights. In our work, AVA takes into
(2021). If the amount of data is not large, all possible fields can account the characteristics of the data itself as well as different
be enumerated based on data type and name (Wongsuphasawat user preferences to control the output of the insight results.
et al., 2015). DataSite (Cui et al., 2019a) analyzes the data through
predefined algorithms, such as calculating Pearson correlation 2.2. Visualization recommendation
coefficients (Benesty et al., 2009) for all combinations of nu-
merical attributes. There are also studies to evaluate insights by An increased level of interest has recently emerged in facilitat-
designing relevant metrics to identify more valuable insights. For ing visual data exploration by recommending visualizations (Zhou
example, Voder (Srinivasan et al., 2018) uses a set of predefined et al., 2022a). As early as 1986, Mackinlay (1986) proposed APT
heuristics to group data facts into three tiers. Tier 1 consists of to rank perceptual channels based on the type of data fields.
the most prominent data facts. Ding et al. (2019) designed an This provides a reference guideline for the visual encoding rec-
insight evaluation algorithm to eliminate easily inferable insights ommendation of data, which is employed in systems such as
to achieve high-quality insight results. Mafrur et al. (2018) pro- Tableau (Mackinlay et al., 2007) and Voyager (Wongsuphasawat
posed a hybrid objective utility function, which captures both the et al., 2015, 2017).
importance and the diversity of insights. Vartak et al. (2015) built Depending on whether the user explicitly specifies the pur-
a system, SeeDB, a DBMS middleware, that uses a deviation-based pose of the analysis or implicitly infers user intent from the data,
utility metric to display large deviations from some reference. the type of chart that is appropriate for that type of task can be
As the dimensionality of the data increases, the space of pos- recommended (Gotz and Zhou, 2009; Zeng et al., 2021). For exam-
sible insights becomes very large, making it difficult for the ple, in DataVizard (Ananthanarayanan et al., 2018), a line chart is
computational speed to meet the interaction requirements. In recommended when the user wants to see the trend of the data,
order to accelerate the computation, Foresight (Demiralp et al., while when the user selects a numeric variable and a category
2017) proposed a sketch composition for fast approximate com- variable, it can be inferred that the user wants to compare the
putation of insight metrics. Calliope (Shi et al., 2020) which uses a number of different categories, which is where a bar chart is
logic-oriented Monte-Carlo tree search algorithm. The algorithm recommended. In addition to those based on empirical rules,
avoids the time-consuming enumeration of the data space via prior studies recommend visualizations based on exploring users’
a reward function and a logic filter to ensure the quality of intents based on their behaviors or interactions (Gotz and Wen,
the generation results. Databiting (Rey et al., 2024) supports 2009; Brown et al., 2014) as well as constraint-based optimization
interaction with personal data to provide enriched insight ex-
ploration in mobile devices. Recent advancements in this field 1 https://www.microsoft.com/en-us/power-platform/products/power-bi/.
include systems like XInsight (Ma et al., 2023b), which pro- 2 https://www.google.com/sheets/about/.
vides a general framework for explainable data analysis through 3 https://aws.amazon.com/quicksight/.
107
J. Wang, X. Li, C. Li et al. Visual Informatics 8 (2024) 106–114
using off-the-shelf solvers for visual design criteria (Moritz et al., 3.1. Visualization development scenarios and issues
2018; Lin et al., 2020). Besides, Data2Vis (Dibia and Demiralp,
2019), VizML (Hu et al., 2019), as well as DataShot (Wang et al., To better understand the problems visualization developers
2019) employ end-to-end models to directly learn the mapped encountered in practice, we conducted semi-structured inter-
relationships between data and encoding. Chen et al. (2021), on views with three experts. All three experts are from an IT com-
the other hand, provides VizLinter, an architecture that includes pany; one is a product manager, one is a front-end engineer, and
a linter and fixer to check existing visualizations for violations of the other is a designer. They all have extensive experience in de-
design guidelines and provide suggestions for fixes using linear
veloping visualization systems such as business intelligence. Typ-
programming.
ically, a product manager will come up with a requirement and
In addition, prior work investigated how to recommend visual
write a product requirements document. This document needs
encodings that involve aesthetics and perception. For example,
to go through several rounds of reviews and iterations before it
LADV (Ma et al., 2020) formalized several aesthetic metrics to op-
becomes a formal requirement. After that, the project is sched-
timize the grid layout of the dashboard. MobileVisFixer (Wu et al.,
uled. The front-end and back-end engineers will develop based
2020) employed a reinforcement learning framework to adjust
layout parameters to generate more mobile-friendly visualiza- on the requirements document and the design drawings. After
tions. ColorCrafting (Smart et al., 2019) employed an algorithmic the development is completed, it needs to go through self-testing,
approach that models designer practices by analyzing patterns in front-end and back-end debugging, and regression testing, and if
the structure of designer-crafted color ramps. bugs are found in this process, they need to be fixed. Finally, it
More recently, with the widespread development of machine needs to be checked before it can be put online in the production
learning models, AI-powered tools have been proposed to support environment gradually. Three common issues were summarized
visualization recommendations as well (Chen et al., 2023). Cal- during the interviews.
liope (Shi et al., 2020) and Autoclips (Shi et al., 2021) supported I1 Lack of visualization expertise. The lack of visualization
automated generations of data stories and videos from large design knowledge and development skills for developers
datasets. Viz2viz (Wu et al., 2023b) utilized diffusion models to creates difficulties in implementing data visualization re-
support generating aesthetic stylized visualizations. ChartSpark quirements, especially in some innovation-incubated and
(Xiao et al., 2023) further incorporated semantic context into urgent projects.
generated pictorial visualizations with text-to-image models. The I2 Diverse requirements for different businesses. Different
incorporation of LLMs in visualization tasks has opened new businesses will have different needs, such as the configu-
possibilities. Li et al. Li et al. (2024) showcases the effectiveness of ration of the chart, and it often takes a lot of time to make
LLMs, specifically GPT-3.5, in generating Vega-Lite specifications personalized changes in actual development.
from natural language descriptions. ChartGPT (Tian et al., 2024) I3 Difficult to choose the right chart. In the beginning, de-
introduced natural language interfaces to generate charts from velopers often do not know what form of visualization to
abstract natural language inputs with LLMs support. LIDA (Dibia, use for their data, which requires many iterations to find
2023) is another example that combines LLMs with image gener- the suitable one. In addition, differences in the data itself
ation models to produce charts and infographics. It defines a four- can affect the comprehensibility of the charts.
stage task that combines LLMs and image generation models to
interpret datasets and analysis objectives. Data Formulator (Wang
et al., 2023) introduces a paradigm where an AI agent separates 3.2. Design goals
high-level visualization intent from low-level data transformation
steps, enabling the automatic generation of desired visualizations After learning about the problems frequently encountered by
from defined data concepts. The generative methods serve as people involved in visualization projects, we also interviewed
good data creativity tools when generating charts, however, they them about what features they would like to see in existing visu-
cannot fit well with the employed data itself, resulting in low alization development frameworks to solve this problem. There-
performance in generating actionable insights (Basole and Major, fore, we summarized three design goals.
2024). G1 Reduce development effort. Developers want to simplify
The aforementioned literature offers a variety of effective the workload and knowledge threshold in the visualization
methods and means for automatically recommending chart types. chart development. Developers do not always want to be
However, in real-world engineering practice, there is often a need involved in every step of the visualization pipeline. It would
for interpretability and certainty in the recommendation results, ease their burden to recommend good visualizations, both
as well as a user-friendly development approach on the engi- individual visualizations and collections of visualizations,
neering side. AVA provides empiric- driven and insight-driven such as dashboards.
visualization recommendation methods. It also offers an explain-
G2 Support personalized configuration. Support custom con-
able rule system and a suite of front-end tools centered around
figuration of charts to reduce tedious code adjustments.
automated visualization. These features help to ensure that the
It would be beneficial to open customization capabilities,
chart recommendations provided are interpretable and robust in
such as customizing specific recommendation rules on/off
an engineering context. Besides, existing methods still require
and independently modifying a part of the recommenda-
users to interpret the insights themselves after recommend-
tion process.
ing visualizations. AVA further supports narrative visualization
G3 Adapt to different data. Users want to be able to select
and helps users understand the information conveyed by the
visualization through descriptive text. a suitable chart according to different data characteristics.
For example, a line chart usually reflects the trend well
3. Design consideration for time-series data, but when there are many outliers in
the data, a scatter chart has a clearer presentation. On the
In this section, we provide an in-depth analysis of the actual other hand, when users are faced with unfamiliar data, a
scenario of visualization development and present our design visualization generated adaptively based on the data can
goals. help users explore the data quickly.
108
J. Wang, X. Li, C. Li et al. Visual Informatics 8 (2024) 106–114
Fig. 1. The overall architecture of AVA 3.0. Multiple npm packages are integrated into one: @antv/ava. It keeps core features as APIs, which simplifies the usage while
ensuring flexibility. This framework provides react components @antv/ava-react and the new NTV module supports narrative text visualization. (For interpretation
of the references to colour in this figure legend, the reader is referred to the web version of this article.)
4. Architecture
4 https://github.com/antvis/antv-spec. 6 https://chartcube.alipay.com/.
5 https://github.com/antvis/g2plot. 7 https://github.com/antvis/AVA.
109
J. Wang, X. Li, C. Li et al. Visual Informatics 8 (2024) 106–114
weights for our soft rules based on the business demands. The full
definition of the rules can be found on GitHub.8
4.1.4. ntv-schema
In order to standardize the style of textual expression, AVA
employs the concept of visual channel mapping from data visu-
alization theory to regulate the visual mapping representation of
text and implement it in NTV.
To formalize this specification, AVA has designed a set of
declarative schemas for interpreting the text, namely ntv-schema.
This will facilitate the circulation of the schema among vari-
ous systems and also serve as preparation for future intelligent
recommendations.
The ntv-schema is divided into two layers: the structural layer
and the phrase layer. In terms of the structural layer, the whole Fig. 3. The Data pre-processing contains the following components and
interpretation structure is called narrative, which consists of a functions: (i) General functions and (ii) DataSet (non-entity concept).
headline and multiple sections. Each section is composed of sev-
eral paragraphs, and each paragraph consists of multiple phrases.
The phrase layer reflects the most significant difference between
‘‘data-describing text’’ and ordinary text. Phrases are categorized
into three types: text, entity, and custom. The text type is or-
dinary plain text. The entity type represents phrases with data
meaning that map data to text, which is the main content for
visualizing the interpretation of the text. The custom type is
a phrase slot that allows users to customize, often used for
implementing phrase-level interactions.
Fig. 4. The Advisor contains two main tool functions: (i) Advise(), which rec-
4.2. Recommendation pipeline ommends charts automatically and (ii) Lint(), which provides chart optimization
suggestions.
The main process is visualization recommendation, which
includes data pre-processing, empiric-driven recommendation,
insight-driven recommendation, and narrative data interpreta- output the most influential one. The Linter component could
tion. compute how the input chart can fit the empirical rules, then fix
the errors or give feedback to the user. The computation of both
4.2.1. Data pre-processing components is based on the above Rules. The output will be a
Data pre-processing (Fig. 1-A) is the data processing and pre- set of visualization advice that would make the chart accomplish
computing library of AVA. It contains general functions and all the hard rules and achieve the largest weight for the soft
dataset components. The general functions are divided into four rules. The advice follows the antv-spec grammar which could be
parts: Analyzer, Statistics, Random, and Utils. translated to G2Plot for rendering.
The Analyzer is used to analyze what type a value belongs to, As shown in Fig. 4, for an input dataset, it would be fed into the
which may be one integer, float, date, string, or null, and can be Advisor component first, which would output with visualization
further classified into nominal, ordinal, discrete, etc. The Random advice. The advice would be adapted to G2Plot via antv-spec for
is a random value generator that can be used to generate mock rendering as a chart. Finally, the chart would go through the
data as well as color palettes, etc. Some statistical information Linter component which could output visualization problems as
of the data is calculated by Statistics, such as mean, quartile, an optional output for users to improve their charts.
variance, and Pearson correlation. Utils consist of data type infer- Besides, the two components can be used together or indi-
ence and utility data operations. To improve performance, Data vidually. For example, users are allowed to input a manual chart
pre-processing also uses WeakMap9 to cache some statistics. into the Linter to prettify it. In Ant Group’s business intelligence
Dataset implements Series, a one-dimensional data structure, platform, Linter helps users at every level to create charts and
DataFrame, a two-dimensional data structure, and Graph, a topo- get insights easily. Using smart and appropriate prompts, users
logical relational data structure. Representing the raw data in an no longer need to look for the chart configuration and the chart
configuration will look for users automatically. When there are
appropriate data structure facilitates the selection of specific vi-
some problems in a chart, Linter will find problems quickly and
sualization forms and insight extraction algorithms. The structure
prompt the user through the yellow breathing light. After that,
of Data pre-processing is shown as Fig. 3.
the user can follow the guide and fix problems, as shown in Fig. 5.
4.2.2. Empiric-driven recommendation
4.2.3. Insight-driven recommendation
Advisor is the empiric-driven chart recommendation (Fig. 1-B)
GetInsights (Fig. 1-C) is the insight-driven visual exploration
and lint lib of AVA. It employs a rule-based chart recommendation library of AVA. It employs a pruning-based insight exploration
model. The pipeline of Advisor can be illustrated in Fig. 4. Both model, which classifies data by insight type through subspace
the recommendation and lint processes are based on empirical enumeration, pruning, and pattern matching.
rules. For a given dataset, the Advisor component will compute Then the insight score is calculated and the final output
the score through every type of chart among those rules, and includes the insight result, meta-data, and optional output struc-
ture. The insight score consists of numerical impact and task-
8 https://github.com/antvis/AVA. related impact. The numerical impact reflects the importance of
9 https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_ the insight for the whole data set, which is defined on a specific
Objects/WeakMap. impact measure (e.g., as a percentage).
110
J. Wang, X. Li, C. Li et al. Visual Informatics 8 (2024) 106–114
5.1.2. Insight-driven recommendation Despite its innovative features, AVA also has a few limitations
When it comes to the insight-driven visualization recommen- worth noting. First, it lacks support for configuration recommen-
dation tool, the main feedback is that our tool can help users dations, leaving the task of understanding and setting configu-
understand the overview of data quickly and significantly reduce rations to the user. This could complicate the user experience,
the time cost for exploratory analysis. particularly for those new to data visualization (Stolper et al.,
Fig. 7 shows the process when the expert P2 is conducting 2014). Second, AVA currently focuses on recommending statistical
exploratory analysis within the dataset via our insight-driven charts. In particular, it does not provide comprehensive solutions
recommendation tool. Under this circumstance, P2 loaded an
for complex chart types such as maps, graphs, or juxtaposed
unfamiliar dataset, which pertains to the daily discounted prices
charts, limiting its potential for higher levels of visualization
over a certain period of time. In the past P2 would spend tens of
comprehension (Quadri et al., 2024). Another ongoing challenge is
minutes to comprehend the overall nature of the data by using
to expand AVA’s coverage and recommendation capabilities while
different analytical and visualization tools. Using our recommen-
maintaining its user-friendliness and ease of use with regarding
dation process, we can show P2 overall insights extracted from
open-course software best practices (Wilson et al., 2017).
the whole dataset, including the trend and outliers. According
to users’ feedback, the insight-driven process can help them
6.3. Future work
understand new and unknown datasets faster and easier.
AVA has made initial attempts at intelligent visualization so
5.2. Comparative analysis
far, and there are many valuable directions for future devel-
opment. First, we plan to summarize more design experience
We compared AVA to several commonly used data visualiza-
from visualization academia and industry for better visualization
tion tools, libraries, and systems in the industry across various
recommendation and linter. Second, the underlying declarative
dimensions, including Tableau (Mackinlay et al., 2007), Echarts (Li
syntax across AntV will continue to be improved, and interaction
et al., 2018), Vega-Lite (Satyanarayan et al., 2016), D3.js (Bostock
syntax will be added to it so that interactions in visualization
et al., 2011), Matplotlib (Hunter, 2007), and Draco (Moritz et al.,
2018). The following presents the results of the comparison (see
Table 1). 10 https://chartcube.alipay.com/.
112
J. Wang, X. Li, C. Li et al. Visual Informatics 8 (2024) 106–114
Table 1
Comparison of visualization tools.
Library Insight Chart Usability Customizability
mining recommendation
AVA ! ! Moderate Moderate
ECharts ✗ ✗ Moderate Moderate
Vega-Lite ✗ ✗ Moderate Moderate
D3.js ✗ ✗ Hard Powerful
Tableau ✗ ! Easy Limited
Matplotlib ✗ ✗ Hard Powerful
Draco ✗ ! Moderate Moderate
charts can be recommended. Finally, the recommended visualiza- Basole, R.C., Major, T., 2024. Generative AI for visualization: Opportunities and
tion charts are currently mainly limited to regular diagrams, and challenges. IEEE Comput. Graph. Appl. 44 (2), 55–64.
Battle, L., Ottley, A., 2023. What do we mean when we say ‘‘insight’’? A formal
we intend to add recommendations for visual forms such as maps
synthesis of existing theory. IEEE Trans. Vis. Comput. Graphics.
and graphs to suit more practical application scenarios. Benesty, J., Chen, J., Huang, Y., Cohen, I., 2009. Pearson correlation coefficient.
In: Noise Reduction in Speech Processing. Springer, pp. 1–4.
7. Conclusion Borland, D., Wang, A.Z., Gotz, D., 2024. Using counterfactuals to improve causal
inferences from visualizations. IEEE Comput. Graph. Appl. 44 (1), 95–104.
Bostock, M., Ogievetsky, V., Heer, J., 2011. D3: Data-driven documents. IEEE
This paper introduces AVA, an open-source framework that Trans. Vis. Comput. Graphics 17 (12), 2301–2309.
addresses the complexities of modern data visualization. AVA Brown, E.T., Ottley, A., Zhao, H., Lin, Q., Souvenir, R., Endert, A., Chang, R., 2014.
combines empiric and insight-driven methods, offering visually Finding waldo: Learning about users from their interactions. IEEE Trans. Vis.
appealing and meaningful charts and narrative insight descrip- Comput. Graphics 20 (12), 1663–1672.
Ceneda, D., Gschwandtner, T., May, T., Miksch, S., Schulz, H.-J., Streit, M.,
tions. Through cases and comparative analysis, we show that
Tominski, C., 2016. Characterizing guidance in visual analytics. IEEE Trans.
AVA enables effective visualization recommendation with flexible Vis. Comput. Graphics 23 (1), 111–120.
customization. As an open-source tool, AVA invites continuous Chen, Q., Cao, S., Wang, J., Cao, N., 2023. How does automation shape the
improvement for further refinement in tackling more complex process of narrative visualization: A survey of tools. IEEE Trans. Vis. Comput.
data and scenarios. Graphics.
Chen, W., Huang, Z., Wu, F., Zhu, M., Guan, H., Maciejewski, R., 2017. VAUD:
A visual analysis approach for exploring spatio-temporal urban data. IEEE
CRediT authorship contribution statement Trans. Vis. Comput. Graphics 24 (9), 2636–2648.
Chen, Q., Sun, F., Xu, X., Chen, Z., Wang, J., Cao, N., 2021. Vizlinter: A linter and
Jiazhe Wang: Software, Writing – original draft. Xi Li: Soft- fixer framework for data visualization. IEEE Trans. Vis. Comput. Graphics 28
(1), 206–216.
ware. Chenlu Li: Software, Writing – review & editing. Di Peng:
Cui, Z., Badam, S.K., Yalçin, M.A., Elmqvist, N., 2019a. Datasite: Proactive visual
Software. Arran Zeyu Wang: Software, Writing – review & edit- data exploration with computation of insight-based recommendations. Inf.
ing. Yuhui Gu: Software. Xingui Lai: Software. Haifeng Zhang: Vis. 18 (2), 251–267.
Software. Xinyue Xu: Software. Xiaoqing Dong: Conceptualiza- Cui, W., Zhang, X., Wang, Y., Huang, H., Chen, B., Fang, L., Zhang, H., Lou, J.-
G., Zhang, D., 2019b. Text-to-viz: Automatic generation of infographics from
tion. Zhifeng Lin: Conceptualization. Jiehui Zhou: Writing – orig-
proportion-related natural language statements. IEEE Trans. Vis. Comput.
inal draft. Xingyu Liu: Writing – original draft. Wei Chen: Super- Graphics 26 (1), 906–916.
vision. Demiralp, Ç., Haas, P.J., Parthasarathy, S., Pedapati, T., 2017. Foresight:
Recommending visual insights. Proc. VLDB J..
Declaration of competing interest Dibia, V., 2023. LIDA: A tool for automatic generation of grammar-agnostic
visualizations and infographics using large language models. arXiv:2303.
02927.
The authors declare that they have no known competing finan- Dibia, V., Demiralp, Ç., 2019. Data2vis: Automatic generation of data visualiza-
cial interests or personal relationships that could have appeared tions using sequence-to-sequence recurrent neural networks. IEEE Comput.
to influence the work reported in this paper. Graph. Appl. 39 (5), 33–46.
Dimara, E., Bailly, G., Bezerianos, A., Franconeri, S., 2018. Mitigating the attraction
effect with visualizations. IEEE Trans. Vis. Comput. Graphics 25 (1), 850–860.
Acknowledgment Ding, R., Han, S., Xu, Y., Zhang, H., Zhang, D., 2019. Quickinsights: Quick and
automatic discovery of insights from multi-dimensional data. In: Proc. 2019
We would like to thank Jiefeng Yuan, Fuling Sun, Nan Chen, ACM SIGMOD Conference. pp. 317–332.
Gleicher, M., Albers, D., Walker, R., Jusufi, I., Hansen, C.D., Roberts, J.C., 2011.
Ying Chen, Peiying Zhang and Liangjing Bi for their kind help on
Visual comparison for information visualization. Inf. Vis. 10 (4), 289–309.
this paper. Wei Chen is supported by National Natural Science Gotz, D., Stavropoulos, H., 2014. Decisionflow: Visual analytics for high-
Foundation of China (62132017) and Zhejiang Provincial Natural dimensional temporal event sequence data. IEEE Trans. Vis. Comput. Graphics
Science Foundation of China (LD24F020011). 20 (12), 1783–1792.
Gotz, D., Sun, S., Cao, N., 2016. Adaptive contextualization: Combating bias during
high-dimensional visualization and data selection. In: Proc. 21st ACM IUI
Ethical approval Conference. pp. 85–95.
Gotz, D., Wen, Z., 2009. Behavior-driven visualization recommendation. In: Proc.
This study does not contain any studies with human or animal 14th ACM IUI Conference. pp. 315–324.
subjects performed by any of the authors. Gotz, D., Zhou, M.X., 2009. Characterizing users’ visual analytic activity for insight
provenance. Inf. Vis. 8 (1), 42–55.
Guo, S., Xu, K., Zhao, R., Gotz, D., Zha, H., Cao, N., 2017. Eventthread: Visual
References summarization and stage analysis of event sequence data. IEEE Trans. Vis.
Comput. Graphics 24 (1), 56–65.
Ananthanarayanan, R., Lohia, P.K., Bedathur, S., 2018. Datavizard: Recommend- Hu, K., Bakker, M.A., Li, S., Kraska, T., Hidalgo, C., 2019. Vizml: A machine learning
ing visual presentations for structured data. In: Proc. 21st International approach to visualization recommendation. In: Proc. 2019 CHI Conf. Hum.
Workshop on the Web and Databases. pp. 1–6. Factors Comput. Syst.. pp. 1–12.
Armstrong, Z., Wattenberg, M., 2014. Visualizing statistical mix effects and Hunter, J.D., 2007. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9
simpson’s paradox. IEEE Trans. Vis. Comput. Graphics 20 (12), 2132–2141. (3), 90–95.
113
J. Wang, X. Li, C. Li et al. Visual Informatics 8 (2024) 106–114
Jin, Z., Guo, S., Chen, N., Weiskopf, D., Gotz, D., Cao, N., 2020. Visual causality Stolper, C.D., Perer, A., Gotz, D., 2014. Progressive visual analytics: User-driven
analysis of event sequence data. IEEE Trans. Vis. Comput. Graphics 27 (2), visual exploration of in-progress analytics. IEEE Trans. Vis. Comput. Graphics
1343–1352. 20 (12), 1653–1662.
Kale, A., Guo, Z., Qiao, X.L., Heer, J., Hullman, J., 2023. Evm: Incorporating model Szafir, D.A., Borgo, R., Chen, M., Edwards, D.J., Fisher, B., Padilla, L., 2023.
checking into exploratory visual analysis. IEEE Trans. Vis. Comput. Graphics. Visualization Psychology. Springer Nature.
Kaul, S., Borland, D., Cao, N., Gotz, D., 2021. Improving visualization inter- Szafir, D.A., Haroz, S., Gleicher, M., Franconeri, S., 2016. Four types of ensemble
pretation using counterfactuals. IEEE Trans. Vis. Comput. Graphics 28 (1), coding in data visualizations. J. Vis. 16 (5), 11.
998–1008. Tang, B., Han, S., Yiu, M.L., Ding, R., Zhang, D., 2017. Extracting top-k insights
Li, D., Mei, H., Shen, Y., Su, S., Zhang, W., Wang, J., Zu, M., Chen, W., 2018. Echarts: from multi-dimensional data. In: Proc. 2017 ACM SIGMOD Conference. pp.
A declarative framework for rapid construction of web-based visualization. 1509–1524.
Vis. Inf. 2 (2), 136–146. Tian, Y., Cui, W., Deng, D., Yi, X., Yang, Y., Zhang, H., Wu, Y., 2024. Chartgpt:
Li, G., Wang, X., Aodeng, G., Zheng, S., Zhang, Y., Ou, C., Wang, S., Liu, C.H., Leveraging llms to generate charts from abstract natural language. IEEE
2024. Visualization generation with large language models: An evaluation. Trans. Vis. Comput. Graphics.
arXiv:2401.11255. Tseng, C., Quadri, G.J., Wang, Z., Szafir, D.A., 2023. Measuring categorical per-
Lin, Q., Ke, W., Lou, J.-G., Zhang, H., Sui, K., Xu, Y., Zhou, Z., Qiao, B., Zhang, D., ception in color-coded scatterplots. In: Proc. 2023 CHI Conf. Hum. Factors
2018. Bigin4: Instant, interactive insight identification for multi-dimensional Comput. Syst..
big data. In: Proc. 24th ACM KDD Conference. pp. 547–555. Tseng, C., Wang, A.Z., Quadri, G.J., Albers Szafir, D., 2024. Revisiting categori-
Lin, H., Moritz, D., Heer, J., 2020. Dziban: Balancing agency & automation in cal color perception in scatterplots: Sequential, diverging, and categorical
visualization design via anchored recommendations. In: Proc. 2020 CHI Conf. palettes. In: Proc. EuroVis 2024 - Short Papers.
Hum. Factors Comput. Syst.. pp. 1–12. Vartak, M., Rahman, S., Madden, S., Parameswaran, A., Polyzotis, N., 2015.
Ma, P., Ding, R., Han, S., Zhang, D., 2021. MetaInsight: Automatic discovery Seedb: Efficient data-driven visualization recommendations to support visual
of structured knowledge for exploratory data analysis. In: Proc. 2021 ACM analytics. In: Proc. VLDB Endowment. Vol. 8, p. 2182.
SIGMOD Conference. pp. 1262–1274. Wang, A.Z., Borland, D., Gotz, D., 2024. An empirical study of counterfactual
Ma, P., Ding, R., Wang, S., Han, S., Zhang, D., 2023a. Demonstration of InsightPilot: visualization to support visual causal inference. Inf. Vis. 14738716241229437.
An LLM-empowered automated data exploration system. arXiv:2304.00477. Wang, Y., Sun, Z., Zhang, H., Cui, W., Xu, K., Ma, X., Zhang, D., 2019. Datashot:
Ma, P., Ding, R., Wang, S., Han, S., Zhang, D., 2023b. XInsight: explainable Automatic generation of fact sheets from tabular data. IEEE Trans. Vis.
data analysis through the lens of causality. Proc. ACM SIGMOD Conf. 1 (2), Comput. Graphics 26 (1), 895–905.
1–27. Wang, C., Thompson, J., Lee, B., 2023. Data formulator: AI-powered
Ma, R., Mei, H., Guan, H., Huang, W., Zhang, F., Xin, C., Dai, W., Wen, X., concept-driven visualization authoring. IEEE Trans. Vis. Comput. Graphics
Chen, W., 2020. Ladv: Deep learning assisted authoring of dashboard vi- 1–11.
sualizations from images and sketches. IEEE Trans. Vis. Comput. Graphics 27 Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., Teal, T.K., 2017.
(9), 3717–3732. Good enough practices in scientific computing. PLoS Comput. Biol. 13 (6),
Mackinlay, J., 1986. Automating the design of graphical presentations of e1005510.
relational information. ACM Trans. Graph. 5 (2), 110–141. Wongsuphasawat, K., Moritz, D., Anand, A., Mackinlay, J., Howe, B., Heer, J.,
Mackinlay, J., Hanrahan, P., Stolte, C., 2007. Show me: Automatic presentation 2015. Voyager: Exploratory analysis via faceted browsing of visualization
for visual analysis. IEEE Trans. Vis. Comput. Graphics 13 (6), 1137–1144. recommendations. IEEE Trans. Vis. Comput. Graphics 22 (1), 649–658.
Mafrur, R., Sharaf, M.A., Khan, H.A., 2018. Dive: Diversifying view recommenda- Wongsuphasawat, K., Qu, Z., Moritz, D., Chang, R., Ouk, F., Anand, A., Mackin-
tion for visual data exploration. In: Proc. 27th ACM CIKM Conference. pp. lay, J., Howe, B., Heer, J., 2017. Voyager 2: Augmenting visual analysis with
1123–1132. partial view specifications. In: Proc. 2017 CHI Conf. Hum. Factors Comput.
Moritz, D., Wang, C., Nelson, G.L., Lin, H., Smith, A.M., Howe, B., Heer, J., Syst.. pp. 2648–2659.
2018. Formalizing visualization design knowledge as constraints: Actionable Wood, J., Dykes, J., Slingsby, A., Clarke, K., 2007. Interactive visual exploration of
and extensible models in draco. IEEE Trans. Vis. Comput. Graphics 25 (1), a large spatio-temporal dataset: Reflections on a geovisualization mashup.
438–448. IEEE Trans. Vis. Comput. Graphics 13 (6), 1176–1183.
Nanayakkara, P., Kim, H., Wu, Y., Sarvghad, A., Mahyar, N., Miklau, G., Hull- Wu, Z., Chen, W., Ma, Y., Xu, T., Yan, F., Lv, L., Qian, Z., Xia, J., 2023a. Explainable
man, J., 2024. Measure-observe-remeasure: An interactive paradigm for data transformation recommendation for automatic visualization. Front. Inf.
differentially-private exploratory analysis. In: IEEE Symp. S&P. pp. 231–231. Technol. Electron. Eng. 24 (7), 1007–1027.
North, C., 2006. Toward measuring visualization insight. IEEE Comput. Graph. Wu, J., Chung, J.J.Y., Adar, E., 2023b. Viz2viz: Prompt-driven stylized visualization
Appl. 26 (3), 6–9. generation using a diffusion model. arXiv preprint arXiv:2304.01919.
Qin, X., Luo, Y., Tang, N., Li, G., 2020. Making data visualization more efficient Wu, A., Tong, W., Dwyer, T., Lee, B., Isenberg, P., Qu, H., 2020. Mobilevisfixer:
and effective: a survey. Proc. VLDB J. 29 (1), 93–117. Tailoring web visualizations for mobile phones leveraging an explainable
Quadri, G.J., Rosen, P., 2021. A survey of perception-based visualization studies reinforcement learning framework. IEEE Trans. Vis. Comput. Graphics 27 (2),
by task. IEEE Trans. Vis. Comput. Graphics. 464–474.
Quadri, G.J., Wang, A.Z., Wang, Z., Adorno, J., Rosen, P., Szafir, D.A., 2024. Do Xiao, S., Huang, S., Lin, Y., Ye, Y., Zeng, W., 2023. Let the chart spark: Embedding
you see what I see? A qualitative study eliciting high-level visualization semantic context into chart with text-to-image generative model. IEEE Trans.
comprehension. In: Proc. 2024 CHI Conf. Hum. Factors Comput. Syst.. pp. Vis. Comput. Graphics.
1–26. Xiong, C., Shapiro, J., Hullman, J., Franconeri, S., 2019. Illusion of causality in
Rey, B., Lee, B., Choe, E.K., Irani, P., 2024. Databiting: Lightweight, transient, and visualized data. IEEE Trans. Vis. Comput. Graphics 26 (1), 853–862.
insight rich exploration of personal data. IEEE Comput. Graph. Appl. 44 (2), Yu, B., Silva, C.T., 2019. Flowsense: A natural language interface for visual data
65–72. exploration within a dataflow system. IEEE Trans. Vis. Comput. Graphics 26
Satyanarayan, A., Moritz, D., Wongsuphasawat, K., Heer, J., 2016. Vega-lite: A (1), 1–11.
grammar of interactive graphics. IEEE Trans. Vis. Comput. Graphics 23 (1), Zeng, Z., Moh, P., Du, F., Hoffswell, J., Lee, T.Y., Malik, S., Koh, E., Battle, L.,
341–350. 2021. An evaluation-focused framework for visualization recommendation
Satyanarayan, A., Russell, R., Hoffswell, J., Heer, J., 2015. Reactive vega: A algorithms. IEEE Trans. Vis. Comput. Graphics 28 (1), 346–356.
streaming dataflow architecture for declarative interactive visualization. IEEE Zhou, Z., Meng, L., Tang, C., Zhao, Y., Guo, Z., Hu, M., Chen, W., 2018. Visual
Trans. Vis. Comput. Graphics 22 (1), 659–668. abstraction of large scale geospatial origin-destination movement data. IEEE
Shi, D., Sun, F., Xu, X., Lan, X., Gotz, D., Cao, N., 2021. Autoclips: An automatic Trans. Vis. Comput. Graphics 25 (1), 43–53.
approach to video generation from data facts. In: Comput. Graph. Forum. Zhou, Z., Wang, W., Guo, M., Wang, Y., Gotz, D., 2022a. A design space for
Vol. 40, Wiley Online Library, pp. 495–505. surfacing content recommendations in visual analytic platforms. IEEE Trans.
Shi, D., Xu, X., Sun, F., Shi, Y., Cao, N., 2020. Calliope: Automatic visual data Vis. Comput. Graphics 29 (1), 84–94.
story generation from a spreadsheet. IEEE Trans. Vis. Comput. Graphics 27 Zhou, J., Wang, X., Wang, J., Ye, H., Wang, H., Zhou, Z., Han, D., Ying, H., Wu, J.,
(2), 453–463. Chen, W., 2023. FraudAuditor: A visual analytics approach for collusive fraud
Smart, S., Szafir, D.A., 2019. Measuring the separability of shape, size, and color in health insurance. IEEE Trans. Vis. Comput. Graphics.
in scatterplots. In: Proc. 2019 CHI Conf. Hum. Factors Comput. Syst.. pp. 1–14. Zhou, J., Wang, X., Wong, J.K., Wang, H., Wang, Z., Yang, X., Yan, X., Feng, H.,
Smart, S., Wu, K., Szafir, D.A., 2019. Color crafting: Automating the construction Qu, H., Ying, H., et al., 2022b. Dpviscreator: Incorporating pattern constraints
of designer quality color ramps. IEEE Trans. Vis. Comput. Graphics 26 (1), to privacy-preserving visualizations via differential privacy. IEEE Trans. Vis.
1215–1225. Comput. Graphics 29 (1), 809–819.
Srinivasan, A., Drucker, S.M., Endert, A., Stasko, J., 2018. Augmenting vi- Zhou, Z., Wen, X., Wang, Y., Gotz, D., 2021. Modeling and leveraging analytic
sualizations with interactive data facts to facilitate interpretation and focus during exploratory visual analysis. In: Proc. 2021 CHI Conf. Hum.
communication. IEEE Trans. Vis. Comput. Graphics 25 (1), 672–681. Factors Comput. Syst.. pp. 1–15.
114