Data Visualization notes
Data Visualization notes
UNIT – I
1.1 INTRODUCTION
This chapter provides a high – level introduction to data and information visualization. What
visualizations are, why imagery is so important. It shows how visualizations are applied to
problem solving and discusses the process of visualization.
What is Visualization?
Table in newspaper
Train and subway map with arrival timings
Map of a region
Weather chart
Graph of stock market etc.,
There are many reasons why visualization is important. Perhaps the most obvious reason is
that
human beings use eyes as one of our key senses for information understanding.
Visualization is real and highlights the needs for testing user interpretation of
visualizations in specific decision-making processes.
In larger applications, visualization provide alternative data views and help describe
the data structure, pattern or anomaly in data.
1.2 HISTORY OF VISUALIZATION
In all visualizations, one can clearly see the use of the graphics primitives (points,
lines, areas, and volumes). Beyond the use of graphics, the most important aspect
of all visualizations is their connection to data.
However, visualization is more than simply computer graphics. The fieldof
visualization encompasses aspects from numerous other disciplines, including
human-computer interaction, perceptual psychology, databases, statistics, and data
mining, to name a few. While computer graphics can be used to define and
generate the displays that are used to communicate the information, the sources of
data and the way users interact and perceive the data are all important components
to understand when presenting information
Data Collection and Preparation: The process begins with gathering data from relevant
sources and preparing it for visualization. This involves handling data quality issues, such as
missing values, inconsistencies, or redundancies, through cleaning and preprocessing.
Transformations like normalization or filtering may be applied to make the data suitable for
visual representation, ensuring it is accurate and ready for analysis. Properly prepared data
forms the foundation for effective visualization.
Data Analysis and Selection: Once prepared, the data is analyzed to identify the key
variables, trends, and relationships that best address the visualization's goals. Through
statistical summaries and exploratory analysis, the data is reduced or refined to focus only on
the most relevant parts. This stage emphasizes selecting data elements that will effectively
convey insights without overwhelming the user, especially if dealing with high-dimensional
datasets.
Mapping Data to Visual Structures: Here, data is mapped to visual elements, determining
how features like position, colour, size, or shape will represent different data attributes. This
mapping depends on the data type (e.g., categorical or quantitative) and the message intended
to be conveyed. The visual encodings chosen should capitalize on human perceptual abilities
to ensure clarity and enhance users’ understanding of the data through the visualization.
Design of Interaction Techniques: This step adds interactivity to allow users to engage
directly with the visualization. Interactive features like filtering, zooming, panning, and
highlighting empower users to explore data dynamically, focusing on specific areas or
discovering relationships. Well-designed interaction supports analytical tasks and should be
intuitive, allowing users to easily manipulate the visualization to deepen their insights and
exploration.
Rendering the Visualization: This stage involves the actual implementation of the
visualization using software tools and libraries. Data transformations must be handled
efficiently, especially for real-time or large-scale visualizations, to ensure smooth rendering.
Attention to rendering performance is key to maintaining a responsive experience, with visual
elements presented clearly and in a way that enhances legibility, thus allowing the
visualization to serve as an effective communication tool.
Evaluation and Refinement: Evaluation assesses the visualization’s clarity, usability, and
effectiveness. By gathering qualitative feedback from users and quantitative measures such as
task completion time and error rates, developers can pinpoint areas for improvement. This
feedback guides iterative refinement, ensuring the visualization achieves its goals and is
intuitive for users, making it a continuous process to adapt the visualization based on user
experience.
Deployment and Maintenance: The final step is to deploy the visualization, making it
accessible to its target audience through integration in applications, websites, or dashboards.
Long-term usability requires regular updates to ensure data accuracy, compatibility with
technology, and responsiveness to user feedback. Maintenance is crucial to keep the
visualization relevant and effective, ensuring scalability and accessibility as needs evolve,
thus sustaining its impact over time.
1.4.1 The Computer Graphics Pipeline
Clipping: Clipping involves discarding any part of an object or scene that falls outside the
viewing frustum (the visible region that the camera can "see"). Objects may be transformed
into normalized viewing coordinates to simplify the clipping process. Clipping can actually
be performed at many different stages of pipeline.
Hidden Surface Removal: Polygons facing away from the camera, or those obscured by
others are removed or clipped. This process may be integrated into the projection process.
Projection: 3D polygons are projected onto the 2D plane of projection, usually using a
perspective transformation. The results may be in a normalized 2D coordinate system or
device/ screen coordinates.
Rendering: Rendering is the final step in the computer graphics pipeline, where the 3D scene
is converted into a 2D image. This process takes all elements (geometry, textures, lighting,
and shading) and calculates the colour, brightness, and effects for each pixel in the final
image. Rendering can include various techniques to simulate realistic effects, such as
shadows, reflections, and textures.
Ray tracing: A variant on this pipeline involves casting rays from the camera through the
plane of projection to ascertain what polygons are hit. Secondary rays can also be generated
upon intersection with the surface and the results accumulated. The key algorithms include
mechanism for combining the secondary ray’s effects.
Data Modeling: The data to be visualized, whether from a file or database, has to be
structured to facilitate its visualization. The name, type, range and semantics of each attribute
or field of a data record must be available in a format that ensures rapid access and easy
modification.
Data Selection: It is similar to clipping and it involves identifying the subset of the data that
will be potentially visualized. This occurs totally under algorithmic methods, such as cycling
through time slices or automatically detecting features of potential interest to the user.
Data to visual Mappings: The heart of visualization pipeline is performing the mapping of
data values to graphical entities or their attributes. This mapping often involves processing
the data prior to mapping, such as scaling, shifting, filtering, interpolating or subsampling.
Scene parameter setting (View transformations).: Like traditional graphics, the user must
specify several attributes of visualization that are relatively independent of the data. This
involves colour map selection, sound map selection and lighting specifications.
It is also called Data Mining. It has its own pipeline. Here also we start with data, and we
process it with goal of rendering a model rather than graphics display. The process structure
is as follows.
Data: In the KD pipeline there is more focus on data, as the graphics and visualization
process assume that data is already structured to facilitate display.
Data Integration, Cleaning, warehousing & Selection: This involves identifying the
various data sets that will be potentially analysed. Involves filtering, sampling and other
techniques that help curate the data.
Data Mining: The Heart of KD pipeline is algorithmically analysing the data to produce a
model.
Pattern Evaluation: The resulting model must be evaluated to determine their robustness,
stability, precision and accuracy.
Rendering or Visualization: The specific results must be produced to the user. Interactive
visualization can be used at every step of KD pipeline.
When writing pseudocode for data visualization, it is essential to follow conventions that
ensure clarity, readability, and maintainability. Here’s a list of common conventions:
Data: The working data table is assumed to contain only numeric values. Original data
table that contains non- numeric values will be converted to numeric values. When
visualizing the working data table is assumed to be the subset.
m: The number of dimensions in the working data table. Dimensions are typically
iterated over using j as the running dimension index.
n- The number od records (rows) in the working table. Records are typically iterated
over using i as the running record index.
NORMALIZE: (Record, dimension), A function that maps the value for given record
and dimension in the working data table to a value between min and max, or between
zero and one if min and max are not specified.
COLOR: A function that sets the color state of the graphics environment to the
specified color.
MAPCOLOR (record, dimension)- A function that sets the color state of the graphics
environment to be the color derived from applying the global color map to the
normalized value in working table.
CIRCLE (x, y, radius): A function that fills a circle centered at the location (x,y) with
given radius.
POLYLINE (xs, ys): A function that draws the polyline (many connected segments)
from the given rays of a and y co-ordinates.
POLYGON (xs, ys): A function that fills the polygon defined by arrays of x and y co-
ordinates with the current color state.
For geographic visualizations, the following functions are assumed to exist.