0% found this document useful (0 votes)

6 views

03.Graphics in R

Uploaded by

Antonello Sala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

03.Graphics in R

Uploaded by

Antonello Sala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

DR COLIN S.

GILLESPIE

A D VA N C E D R G R A P H I C S

NEWCASTLE UNIVERSITY
Contents

1 Background 4

2 ggplot2 overview 8

3 Plot building 11

4 A few other things 21

5 Reshaping data 24

6 R setup 26

Bibliography 27
“ I F I C A N ’ T P I C T U R E I T , I C A N ’ T U N D E R S TA N D I T . ”
ALBERT EINSTEIN.

“ T H E G R E AT E S T VA L U E O F A P I C T U R E I S W H E N I T F O R C E S U S T O
N O T I C E W H AT W E N E V E R E X P E C T E D T O S E E . ”
JOHN TUKEY.
1
Background

1.1 Installing packages

Installing packages in R is straightforward. To install a package from the

command line we use the install.packages command. For example,
R> install.packages("ggplot2")
R> library(ggplot2)
For this course, the packages we use are given in chapter 6, table ??. To
update packages with their latest version, we use the update.packages()
command. However, you may need root access to update all packages.

1.2 Types of R graphics

1.2.1 Base graphics

Base graphics were written by Ross Ihaka based on his experience of
implementing the S graphics driver. If you have created a histogram,
scatter plot or boxplot, you’ve probably used base graphics. Base
graphics are generally fast, but have limited scope. For example, you
can only draw on top of the plot and cannot edit or alter existing
graphics. For example, if you combine the plot and points commands,
you have to work out the x- and y- limits before adding the points.

1.2.2 Grid graphics

Grid graphics were developed by Paul Murrell1 . Grid grobs (graphical 1
P Murrell. R Graphics. CRC Press,
objects) can be represented independently of the plot and modified later. 2 edition, 2011

The viewports system makes it easier to construct complex plots. Grid

doesn’t provide tools for graphics, it provides primitives for creating
plots. Lattice and ggplot2 graphics use grid.

1.2.3 Lattice graphics

The lattice package uses grid graphics to implement the trellis graphics
system2 . It produces nicer plots than base graphics and legends are 2
D Sarkar. Lattice: Multivariate
automatically generated. I initially started using lattice before ggplot2. Data Visualization with R (Use R!).
Springer, 1st edition, 2008
However, I found it a bit confusing and so switched to ggplot2.
advanced r graphics 5

manufacturer model displ year cyl trans cty hwy class Table 1.1: The last five cars in the mpg
dataset. The variables cty and hwy
volkswagen passat 2.0 2008 4 auto(s6) 19 28 midsize record miles per gallon for city and
volkswagen passat 2.0 2008 4 manual(m6) 21 29 midsize highway driving respectively. The vari-
volkswagen passat 2.8 1999 6 auto(l5) 16 26 midsize able displ is the engine displacement
in litres.
volkswagen passat 2.8 1999 6 manual(m5) 18 26 midsize
volkswagen passat 3.6 2008 6 auto(s6) 17 26 midsize

1.2.4 ggplot2 graphics

ggplot2 started in 20053 and follows the “Grammar of Graphics”4 Like 3
H Wickham. ggplot2: Elegant Graph-
lattice, ggplot2 uses grid to draw graphics, which means you can ics for Data Analysis. Springer, New
York, 2009. ISBN 978-0-387-98140-6
exercise low-level control over the plot appearance. 4
We’ll come on to that later.

1.3 Data sets

Throughout the course, we will use a few different datasets.

1.3.1 Fuel economy data

This dataset includes car make, model, class, engine size and fuel
economy for a selection of US cars in 1999 and 2008. It is included
with the ggplot2 package5 and is loaded using the data function: 5
The data originally comes from the
EPA fuel economy website, http://
R> library(ggplot2) fueleconomy.gov
R> data(mpg)
Table 1.1 gives the last five cars in this data set.

1.3.2 The tips data set

A single waiter recorded information about each tip he received over a
few months while working in a particular restaurant. He collected data
on several variables

tip($),

bill($),

gender of the bill payer,

whether there were smokers in the party,

day of the week6 6

The waiter only worked Thursday,
Friday, Saturday and Sundays.
time of day,
party size.

There were a total of 244 tips. The first few rows of this data set are
shown in table 1.2. The data comes with the reshape2 package and is
loaded using the data function:
The data comes with the reshape2 package and is loaded using the
data function:
R> library(reshape2)
R> data(tips)
6 dr colin s. gillespie

Table 1.2: The first five rows of the

total bill tip sex smoker day time size
tips data set. There are 244 rows in
16.99 1.01 Female No Sun Dinner 2 this data set.

10.34 1.66 Male No Sun Dinner 3

21.01 3.50 Male No Sun Dinner 3
23.68 3.31 Male No Sun Dinner 2
24.59 3.61 Female No Sun Dinner 4

1.3.3 Movie data set

The internet movie database7 is a website devoted to collecting movie 7
http://imdb.com/
data supplied by studios and fans. It claims to be the biggest movie
database on the web and is run by amazon. More information about
IMDB can be found online at

http://imdb.com/help/show_leaf?about

including information about the data collection process IMDB makes their raw data available
at http://uk.imdb.com/interfaces/.
http://imdb.com/help/show_leaf?infosource

Example rows are given in table 1.1. This data set contains information
on over 50,000 movies. We will use this dataset to illustrate the concepts
covered in this class. This is the full version of the data set
The dataset contains the following fields: used in the Introduction to R course.

Title. Title of the movie.

Year. Year of release.

Budget. Total budget in US dollars. If the budget isn’t known, then

it is stored as ‘-1’.

Length. Length in minutes.

Rating. Average IMDB user rating.

Votes. Number of IMDB users who rated this movie.

r1: Multiplying by ten gives the percentage (to the nearest 10%) of
users who rated this movie a 1.

r2 – r10: Similar to r1.

mpaa. The MPAA rating - PG, PG-13, R, NC-17.

Action, Animation, Comedy, Drama, Documentary, Romance, Short.

Binary variables representing if movie was classified as belonging to
that genre. A movie can belong to more one genre. See for example
the film Ablaze in table 1.3.

This data set is part of the ggplot2 package:

R> library(ggplot2)
R> data(movies)
advanced r graphics 7

Voting statistics Movie genre

Title Year Length Budget Rating Votes r1 ... r10 mpaa Action Animation Comedy Drama Documentary Romance Short
A.k.a. Cassius 1970 85 -1 5.7 43 4.5 ... 14.5 PG 0 0 0 0 1 0 0
AKA 2002 123 -1 6.0 335 24.5 ... 14.5 R 0 0 0 1 0 0 0
Alien Vs. Pred 2004 102 45000000 5.4 14651 4.5 ... 4.5 PG-13 1 0 0 0 0 0 0
Abandon 2002 99 25000000 4.7 2364 4.5 ... 4.5 PG-13 0 0 0 1 0 0 0
Abendland 1999 146 -1 5.0 46 14.5 ... 24.5 R 0 0 0 0 0 0 0
Aberration 1997 93 -1 4.8 149 14.5 ... 4.5 R 0 0 0 0 0 0 0
Abilene 1999 104 -1 4.9 42 0.0 ... 24.5 PG 0 0 0 1 0 0 0
Ablaze 2001 97 -1 3.6 98 24.5 ... 14.5 R 1 0 0 1 0 0 0
Abominable Dr 1971 94 -1 6.7 1547 4.5 ... 14.5 PG-13 0 0 0 0 0 0 0
About Adam 2000 105 -1 6.4 1303 4.5 ... 4.5 R 0 0 1 0 0 1 0

Table 1.3: Sample rows of the movie

data set. Credit: This data set was
initially constructed by Hadley Wick-
ham at http://had.co.nz/.
2
ggplot2 overview

ggplot2 is a bit different from other graphics packages. It roughly

follows the philosophy of Wilkinson, 19991 . Essentially, we think about 1
L Wilkinson. The Grammar of
plots as layers. By thinking of graphics in terms of layers it is easier Graphics. Springer, 1st edition, 1999

for the user to iteratively add new components and for a developer to
add new functionality.

2.1 A basic plot using base graphics

A reasonable first attempt at analysing this data would be to produce a
scatter plot of (for example), engine displacement against city miles per
gallon. To use base graphics, we would first construct a basic scatter
plot of the data where the cylinder size is 4:2 2
We’ve cheated here and pretended
that we know the x- and y- limits.
R> plot(mpg[mpg$cyl==4,]$displ,
+ mpg[mpg$cyl==4,]$cty,
+ xlim=c(1,8), ylim=c(5,35))
Next we add in the other cars corresponding to different cylinder sizes:
35

R> points(mpg[mpg$cyl==5,]$displ, mpg[mpg$cyl==5,]$cty, ●

+ col=2) ● ●
●
mpg[mpg$cyl == 4, ]$cty

●
25

R> points(mpg[mpg$cyl==6,]$displ, mpg[mpg$cyl==6,]$cty, ● ●

● ●
● ●
● ●
● ● ● ●●
+ col=3)
20

● ●
● ● ●● ● ● ●
● ●● ● ●● ● ● ●
●● ●● ● ● ● ●

R> points(mpg[mpg$cyl==8,]$displ, mpg[mpg$cyl==8,]$cty, ● ●● ● ● ● ● ● ● ●

●● ● ●● ●● ● ● ● ● ●
● ● ●● ● ● ●● ●
●● ●● ● ● ●

+ col=4) ● ●
● ● ●
●●
● ● ●
●●● ● ● ●
10

This would produce figure 2.1. A few points to note:

1 2 3 4 5 6 7 8

We have to manually set the scales in the plot command using xlim mpg[mpg$cyl == 4, ]$displ

and ylim. Figure 2.1: A scatter plot of engine

displacement vs average city miles per
gallon. The coloured points correspond
We haven’t created a legend. We would need to use the legend
to different cylinder sizes. The plot was
function. constructed using base graphics.

The default axis labels are terrible - mpg[mpg$cyl==4,]$displ

If we wanted to look at highway miles per gallon, this is a bit of a

pain.

Let’s now consider the equivalent ggplot2 graphic - figure 2.2. After
loading the necessary library, the plot is generated using the following
code:
advanced r graphics 9

Plot Name Geom Base graphic

Barchart bar barplot
Box-and-whisker boxplot boxplot
Histogram histogram hist Table 2.1: Basic geom’s and their cor-
Line plot line plot and lines responding standard plot names.
Scatter plot point plot and points

35 ●

R> g = ggplot(data=mpg, aes(x=displ, y=cty)) ●

R> g + geom_point(aes(colour=factor(cyl))) 30
●
● ●

The ggplot2 code is fundamentally different from the base code. 25 ● ●

● ●
factor(cyl)
● 4
● ●

cty
● 5
The ggplot function sets the default data set, and attributes called 20
● ● ● ●●
●

●
●

●
● 6
● ● ●● ● ● ● ● 8
aesthetics. The aesthetics are properties that are perceived on the ●

●
●●

●●
● ●● ● ●
●● ●● ●
●
●
● ● ●
● ● ● ● ● ●

15 ●● ● ●● ●● ● ● ● ● ●

graphic. A particular aesthetic can be mapped to a variable or set to ● ● ●● ●

●● ●●
●
●
●●
● ●
●

● ● ● ● ● ●

a constant value. In figure 2.2, the variable displ is mapped to the 10

● ● ●●

●
●●● ● ● ●

x-axis and cty variable is mapped to the y-axis. 2 3 4

displ
5 6 7

The other function, geom_point adds a layer to the plot. The x and Figure 2.2: As figure 2.1, but created
y variables are inherited (in this case) from the first function, ggplot, using ggplot2.

and the colour aesthetic is set to the cyl variable. Other possible 35 ●

aesthetics are, for example, size, shape and transparency. In figure 2.2 30
●

these additional aesthetics are left at their default value. ● ●

25 ● ●
factor(cyl)
This approach is very powerful and enables us to easily create ●

●
●

●
●
4

cty
● ● ● 5

complex graphics. For example, we could create a plot where the size 20
● ●

●
●

● ●
●
●
●

● ●●
●
●8
6

●
● ●●●●● ●
● ●

of the points depends on an additional factor: ●

●●●●● ●●●
●● ● ●
●

●
●●●● ●● ●
15 ● ●●● ●●●
●
● ● ● ●
● ●●●● ● ●● ●
●● ●●● ●●
R> p = g + geom_point(aes(size=factor(cyl))) ●● ● ●● ●
10
● ● ● ● ●
●● ●●●
●
which gives figure 2.3 or we could create a line chart 2 3 4
displ
5 6 7

R> p = g + geom_line( Figure 2.3: As figure 2.2, but where

the size aesthetic depends on cylinder
+ aes(colour=factor(cyl), size = factor(cyl))) size.

to get figure 2.4. Of course, figures 2.3 and 2.4 aren’t particular good 35

plots, they just illustrate the general idea. 30

Points, bars and lines are all examples of geom’s or geometric

25 factor(cyl)

objects. Typically, if we use a single geom, we get a standard plot. 4

cty

Table 2.1 summarises some standard geoms and their equivalent base 20
6
8

graphic counter part. 15

However using the idea of a graphical grammar, we can construct

10
more complicated functions. For example, this code
2 3 4 5 6 7
displ
R> p = g + geom_point(aes(colour=factor(cyl))) +
Figure 2.4: As figure 2.2, but using
+ stat_smooth(aes(colour=factor(cyl))) geom_line.

produces figure 2.5, which doesn’t really have a simple name. 35 ●

30
●
● ●

In each ggplot2 command, we are adding (multiple) layers. A single ●

25 ● ● factor(cyl)

layer comprises of four elements: ● ●

● ●
● 4
cty

● ● ● 5
● ● ● ●●
● 6
20 ● ●
● 8

an aesthetic and data mapping;

● ● ●● ● ● ●
● ●● ● ●● ● ● ●
●● ●● ● ● ● ●
● ●● ● ● ● ● ● ● ●

15 ●● ● ●● ●● ● ● ● ● ●
● ● ●● ● ● ●● ●
●● ●● ● ● ●

a statistical transformation (stat); 10

● ●
● ●
●●
●
●●●
● ● ●
● ● ●

2 3 4 5 6 7

a geometric object (geom); displ

Figure 2.5: As figure 2.2, but with loess

regression lines.
10 dr colin s. gillespie

and a position adjustment, i.e. how should objects that overlap be

handled.

When we use the command

R> g + geom_point(aes(colour=factor(cyl)))
this is actually a shortcut for the command:
R> g + layer(
+ data = mpg,#inherited
+ mapping = aes(color=factor(cyl)),#x,y are inherited
+ stat = "identity",
+ geom = "point",
+ position = "identity"
+ )
In practice, we never use the layer function. Instead, we use

geom_* which creates a layer with a specific geom (and various

defaults including a stat) and/or

stat_* which create a layer with a specific stat (and various defaults
including a geom) or

qplot which creates a ggplot and a layer. qplot is short for quick plot. I don’t
cover qplot in this course. If you find
yourself using ggplot2 a lot, then it is
worth the time investment.
3
Plot building

3.1 The basic plot object

To create an initial ggplot object, we use the ggplot() function. This
function has two arguments:

data and

an aesthetic mapping.

These arguments set up the defaults for the various layers that are added
to the plot and can be empty. For each plot layer, these arguments can
be overwritten. The data argument is straightforward - it is a data
frame1 . The mapping argument creates default aesthetic attributes. 1
ggplot2 is very strict regarding the
For example data argument. It doesn’t accept ma-
trices or vectors. The underlying phi-
R> g = ggplot(data=mpg, losophy is that ggplot2 takes care of
plotting, rather than messaging it into
+ mapping=aes(x=displ, y=cty, colour=factor(cyl))) other forms. If you want to do some
data manipulation, then use other
or equivalently, tools.
R> g = ggplot(mpg, aes(displ, cty, colour=factor(cyl)))
The above commands don’t actually produce anything to be displayed,
we need to add layers for that to happen.

3.2 Geometric objects

geom’s or geometric objects are used to perform the actual rendering
in a plot. For example, we have already seen that a line geom will
create a line plot and a point geom creates a scatter plot. Each geom
has a list of aesthetics that it expects2 . However, some geoms have 2
For example, x, y, colour and size.
unique elements. The error bar geom requires arguments ymax and
ymin. Table 3.1 gives some standard geoms.3 3
For a full list, see table 4.2 of the
ggplot2 book or online at http://had.
co.nz/ggplot2/.
3.2.1 Example: combining geoms
Let’s look at the tips data set - see §1.3.2 for a description. We begin
by creating a base ggplot object
12 dr colin s. gillespie

Table 3.1: A few standard geom’s in

Name Description
ggplot2.
abline Line, specified by slope and intercept
boxplot Box and whiskers plot
density Kernel density plot
density 2d Contours from a 2s density estimate
histogram Histograms 10 ●

jitter Individual points are jittered to avoid overlap ●

8
smooth Add a smoothed condition mean ●

step Connect observations by stairs ●

●

6 ●
●

tip
4

R> g = ggplot(tips, aes(x=size, y=tip))

Remember, the above piece of code doesn’t do anything. Now we’ll 2

create a boxplot using the boxplot geom:

2 3 4 5
size
R> g1 = g + geom_boxplot() Figure 3.1: A boxplot of tips earned
by the waiter.
This produces figure 3.1. Notice that the default axis labels are the
column headings of the associated data frame. Figure 3.1 is a boxplot 10 ●

of all the tips data, a more useful plot would be to have individual ●

boxplots conditional on table size 8

R> g2 = g + geom_boxplot(aes(group=size)) ●

6 ● ●
●

tip
Notice that we have included a group aesthetic to the boxplot geom. ●
●

Many geom’s have this aesthetic. For example, if we used geom_line, 4

then we would have individual lines for each size - this doesn’t make
much sense in this scenario. 2

We are not restricted to a single geom - we can add multiple geoms.

1 2 3 4 5 6

When data sets are reasonably small, it is useful to display the data on size

Figure 3.2: A boxplots of tips, condi-

top of the boxplots: tional on table size.
R> ##We need to jitter the points to avoid overlap 10 ●●

●●

R> ##We colour the points depending on whether the

R> ##person is smoker ●

● ● ●
●● ●

R> g3 = g2 + 6 ●●
●●
●● ● smoker
No
tip

● ●

● ●●

+ geom_jitter(position=position_jitter(width=0.3), ● ●
● ●● ●
●
● ●●

●
●●
●

●
●
●
● ● ● ● Yes

●● ●
● ●
4 ●●●●
●
● ● ●● ● ● ● ● ● ●

+ aes(colour=smoker)) ●●
●
●
●● ●
●● ● ●
●●
●●
●●
●
●●
●
●
●●
●
●●
●●●
●●●
● ●
● ●
●

●
●
●●●● ●●●●●●
● ●● ●● ● ● ●● ●
● ● ●
● ●
●● ● ●
● ●● ● ●
●●
●●●● ●● ● ● ●
● ● ● ●

This generates figure 3.3. Since the points would all fall on straight 2 ● ●
●● ●
●
●●●●
●●
● ●●
● ●●
●●
●
●●
●●
●
●
● ●
●●
● ●
●
●
●
●●●●
●
●
●
●

●
●
●

●●
●
● ●
●
●

●●
●
●●

●
●● ● ● ●

● ●● ●
● ●

lines, we use the jitter geom to wiggle the points about their axis. We
● ● ● ● ●

1 2 3 4 5 6
size
also colour the points conditional on whether someone at the table Figure 3.3: As figure 3.2, but including
smoked using the colour aesthetic. the data points.

3.3 Standard plots 50000

There are a few standard geom’s that are particular useful: 40000

geom_boxplot: produces a boxplot - see figure 3.1. 30000

count

geom_point: a scatter plot - see figure 3.3. 20000

geom_bar: produces a standard barplot that counts the x values. 10000

NC−17 PG PG−13 R
mpaa

Figure 3.4: A bar chart of the MPAA

rating.
advanced r graphics 13

For example, to generate a bar plot in figure 3.4 of the MPAA ratings
in the movie data set, we use the following code: 10

R> h = ggplot(movies, aes(x=mpaa)) + 8

+ geom_bar()
z
6 0.2
0.4

geom_line: a line plot - see practical 3.

y
0.6
0.8

geom_text: adds labels to specified points. This has an additional

(required) aesthetic: label. Other useful aesthetics, such as hjust 2

and vjust control the horizontal and vertical position. The angle
aesthetic controls the text angle. 2 4 6 8 10
x

Figure 3.5: A heatmap of some exam-

geom_raster: Similar to levelplot or image. For example, ple data using geom_raster. New to
version 0.9.
R> set.seed(1)
50 ●

R> example = expand.grid(x=1:10, y=1:10) ●

●
●

R> example$z = runif(100) ●

●
●
●

40 ●
●

R> ggplot(example, aes(x, y)) + geom_raster(aes(fill=z)) ●

●
●

●
●
●
● z
●
30 ●

generates figure 3.5. If the squares are unequal, then use the (slower) ●
●
●
●
●

●
0
2

y
● ● 4
geom_tile function. ●
●
●
●

● 6
20 ● ● 8
●
●
●
●
●
●
●
●
●
10 ●

3.4 Aesthetics ●
●
●
●
●

●
●
●
●

The key to successfully using aesthetics is remembering that the aes() 10 20 30 40 50

x
function maps data to an aesthetic. If the parameter is not data or is Figure 3.6: Illustration of the continu-
constant, then don’t put it in an aesthetic. Only parameters that are ous colour aesthetic.
inside of an aes() will appear in the legend. To illustrate these ideas,
we’ll generate a simple scatter-plot: 50 ●
●
●
●

R> d = data.frame(x=1:50, y = 1:50, z = 0:9) ●

●
●
●

●
●

R> g_aes = ggplot(d, aes(x = x, y = y)) 40

●
●
●
●

factor(z)
● ● 0
R> g_aes + geom_point(aes(colour = z)) ●
●
●
●

● 1
● ● 2
30 ●
●
● ● 3
which gives figure 3.6. Here the z variable has been mapped to the ●
●
● 4
y

●
●
●
● 5

colour aesthetic. Since this parameter is continuous, ggplot2 uses 20

●
●
●
●
● 6
●
● 7
●

a continuous colour palette. Alternatively, if make z a factor or a ●

●
●
● ●

●
8
9
●

character, ggplot2 uses a different colour palette: 10

●
●
●
●

R> g_aes + geom_point(aes(colour=factor(z))) ●

●
●

10 20 30 40 50
to get figure 3.7. If we set the aesthetic to a constant value (figure 3.8) x

Figure 3.7: Illustration of the discrete

R> g_aes + geom_point(aes(colour="Blue")) colour aesthetic.
50 ●

the resulting plot is unlikely to be what we intended. The value ‘Blue’ ●

●
●
●

is just treated as a standard factor. Instead, you probably wanted ●

●
●
●

40 ●
●
●
●

R> g_aes + geom_point(colour="Blue") ●

●
●
●

●
●
30 ●

Another important point, is that when you specify mappings inside ●

●
●
●

"Blue"
y

●
●
● Blue
ggplot(aes()), these mappings are inherited by every subsequent ●
●
●

20 ●
●

layer. This is fine for x and y, but can cause trouble for other aesthetics. ●
●
●
●

●
●

For example, using the colour aesthetic is fine for geom_line, but may 10
●
●
●
●

not be suitable for geom_text. ●

●
●
●

10 20 30 40 50
x

Figure 3.8: Illustration of a constant

colour aesthetic.
14 dr colin s. gillespie

Table 3.2: Standard aesthetics. Indi-

Aesthetic Description
vidual geom’s may have other aesthet-
linetype Similar to lty in base graphics ics. For example, geom_text uses la-
bel and geom_boxplot has, amongst
colour Similar to col in base graphics other things, upper.
size Similar to size in base graphics
fill See figure 3.5.
shape Glyph choice
alpha Control the transparency

There are few standard aesthetics that appear in most, but not
all, geom’s and stat’s (see table 3.2). Individual geom’s can have
additional optional and required aesthetics. See their help file for
further information.

3.5 Statistical transformations

Statistical transformations or stat’s, transform the data. For example,
in figure 2.5 we use a loess smoother function (conditional on the
number of cylinders) to plot the overall data trend. Remember, all
geoms have stats and, vice visa, all stats have geoms.
A stat takes a dataset as input and returns a dataset as an output.
For example, the boxplot stat4 takes in a data set and produces the 4
Used by the boxplot geom.
following variables:

lower

upper

middle

ymin: bottom (vertical minimum)

ymax: top (vertical maximum).

Typically, these statistics are used by the boxplot geom. Equally, they
could be used by the error bar geom.
A widely used stat, is identity. This stat does not alter the underlying
data and is used by a number of geoms, such as geom_point and
geom_line.

3.5.1 Example: combining stats

Perhaps the easiest stat to consider is the stat_summary function.
This function summarises y values at every unique x value. This is
quite handy, for example, when adding single points that summarise
the data or adding error bars.
advanced r graphics 15

Table 3.3: Standard stat’s in gpplot2.

Name Description Comment
bin Bin data histogram
boxplot Calculates the components See geom_boxplot
of box-and-whisker plots
contour Contours of 3d data
density 1d density estimation
density 2d 2d density estimation
function Superimpose a function
identity Leave the data untouched Used in most geoms
qq Calculation for q-q plots
●

quantile Continuous quantiles

smooth Add a smoother ●
●
4

spoke Convert angle and radius

to xend and yend ●

tip
3
step Create stair steps See geom_step
●

sum Sum unique values

summary Summarises y values 2

at every unique x ●

unique Remove duplicates 1 2 3

size
4 5 6

Figure 3.9: Average tip amount condi-

tional on table size.

A simple plot to create, is the mean tip amount based on table size,
figure 3.9: 1.05
●
●

R> g4 = g + stat_summary(geom="point", fun.y= mean) 1.00

In the above piece of code we calculate the mean tip size for each unique
tip

0.95

x value, that is, for different table sizes. These x-y values are passed to
0.90

the point geom. We can use any function for fun.y provided it takes
in a vector and returns a single point. For example, we could calculate 0.85

the ratio of the mean and median, as in figure 3.10: ●

1 2 3 4 5 6
R> g5 = g + stat_summary(geom="point", size

Figure 3.10: The ratio of the mean

+ fun.y= function(i) mean(i)/median(i))
to median tip amount conditional on
table size.

6
As with the geom example, we can combine multiple stats:
R> g6 = g + 5

+ stat_summary(fun.ymin = function(i) quantile(i, 0.05), smoker

4 No
tip

+ fun.ymax = function(i) quantile(i, 0.95), Yes

+ colour = "blue", geom="errorbar", 3

+ width=0.2) +
2

+ stat_smooth(aes(colour=smoker, lty=smoker),
+ se=FALSE, method="lm") 1

1 2 3 4 5 6
size
Using the stat_summary function, we have created error bars that Figure 3.11: The IQR of the tip
span the inter quantile range. The stat_smooth function plots the amount displayed using error bars.
The stat_smooth function is used
regression lines, conditional on whether someone on the table smokes -
to add OLS regression lines, condi-
figure 3.11. tional on whether anyone in the party
smoked.
16 dr colin s. gillespie

3.6 Facets
Faceting is a mechanism for automatically laying out multiple plots on
a page. The data is split into subsets, with each subset plotted onto a
different panel. ggplot2 has two types of faceting:

facet_grid: produces a 2d panel of plots where variables define

rows and columns. 6000

facet_wrap: produces a 1d ribbon of panels which can be wrapped 5000

into 2d.
4000

count
3.6.1 Facet grid 3000

2000
The function facet_grid lays out the plots in a 2d grid. The faceting
formula specifies the variables that appear in the columns and rows. 1000

Suppose we are interested in movie length. A first plot we could 0

generate is a basic histogram: 0 50 100

length
150 200

R> g = ggplot(movies, aes(x=length)) + xlim(0, 200) Figure 3.12: A histogram of movie

length.
R> g + geom_histogram(binwidth=3)
This produces figure 3.12. Notice that we have altered the x-axis since 0.03

there are a couple of outlying films and adjusted the binwidth in the
histogram. The data is clearly bimodal. Some movies are fairly short, 0.02

0
whilst others have an average length of around one hundred minutes. 0.01

We will now use faceting to explore the data further.

density

0.00

y ∼ .: a single column with multiple rows. This can be handy

for double column journals. For example, to create histograms 0.03

conditional on whether they are comedy films, we use:

0.02

1
R> g + geom_histogram(aes(y=..density..), binwidth=3) +
+ facet_grid(Comedy ~ .) 0.01

This gives figure 3.13. Since there are many more non-comedy than 0.00

0 50 100 150 200

length
comedy films, we use the density in the histogram (look at the
Figure 3.13: Movie length conditional
y-axis). on whether it is a comedy.
0 1

. ∼ x: a single row with multiple columns. Very useful in wide

screen monitors. In this piece of code, we create kernel density plots, 0.25

conditional on whether the movie was animated:

0.20

R> g + geom_density(aes(y=..density..)) +
density

0.15

+ facet_grid(. ~ Animation)
0.10

From figure 3.14, it’s clear that the majority of short films are ani-
mations. For illustration purposes, we have used the geom_density 0.05

function in figure 3.14. 0.00

0 50 100 150 200 0 50 100 150 200

y ∼ x: multiple rows and columns. Typically the variable with the

length

Figure 3.14: Density plots of movie

greatest number of factors is used for the columns. We can also add length conditional on animation.
marginal plots when using facet_grid. By default, margin=FALSE.

R> g + geom_histogram(aes(y=..density..), binwidth=3) +

+ facet_grid(Comedy ~ Animation, margin=TRUE)
advanced r graphics 17

0 1 (all) Figure 3.15: Movie length condi-

tional on animation and action status.
0.25
Marginal histograms are along the top
0.20 column and the right hand column.
0.15

0
0.10

0.05

0.00

0.25

0.20
density

0.15

1
0.10

0.05

0.00

0.25

0.20

0.15

(all)
0.10

0.05

0.00
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
length

Figure 3.15 splits movie length by comedy and animation. Since we

set margin=TRUE, we also have the marginal plots. Notice that the
plot in the bottom right corner is the same as figure 3.12.

The panel labels aren’t that helpful - they are either 0 or 1. By

default ggplot2 uses the values set in the data frame. Typically I
use more descriptive names in my data frame so the default is more
appropriate.

3.6.2 Controlling facet scales

For both facet_grid and facet_wrap we can allow the scale to be the
same in all panels (fixed) or vary between panels. This is controlled by
the scales parameter in the facet_* function:

scales = ‘fixed’: x and y scales are fixed across all panels (de-
fault).

scales = ‘free’: x and y scales vary across all panels.

scales = ‘free_x’: the x scale is free.

scales = ‘free_y: the y scale is free.

We will experiment with these in the practical session.

18 dr colin s. gillespie

1890 1900 1910 1920 1930 1940

Figure 3.16: Movie length conditional
on the decade the movie was created.
4000

3000

2000

1000

0
count

1950 1960 1970 1980 1990 2000

4000

3000

2000

1000

0
0 50 100 150 2000 50 100 150 2000 50 100 150 2000 50 100 150 2000 50 100 150 2000 50 100 150 200
length

Table 3.4: Standard scales in ggplot2.

Function Description
In the above, replace * with either
*_continuous(...) Main scale function. scale_x or scale_y. Common argu-
ments are breaks, labels, na.value,
*_log10(...) log10 transformation. trans and limits. See the help files
*_reverse(...) Reverse the axis. for further details.
*_sqrt(...) The square root transformation.
*_datetime(...) Precise control over dates and times.
*_discrete(...) Not usually needed - see §6.3 of Wickham, 2009.

3.6.3 Facet wrap

The facet_wrap function creates a 1d ribbon of plots. This can be
quite handy when trying to save space. To illustrate, let’s examine
movie length by decade. First, we a create new variable for the movie
decade:5 5
The function round_any is part of the
plyr package.
R> movies$decade = round_any(movies$year, 10, floor)
Then to generate the ribbon of histograms histograms, we use the
facet_wrap function:
R> ggplot(movies, aes(x=length)) + geom_histogram() +
+ facet_wrap( ~ decade, ncol=6) + xlim(0, 200)
As before, we truncate the x-axis. Since we have counts on the y-axis,
we notice that the number of movies made has increased through time.
Also, shorter movies were popular in the 1950’s and 1960’s.

3.7 Axis Scales

When we create complex plots involving multiple layers, ggplot2 uses

an iterative process to calculate the correct scales. For example, if
in figure 3.6 we only plotted the regression lines, ggplot2 would re-
advanced r graphics 19

500
duce the y-axis scale. We can specify set scales using the xlim and
ylim functions. However, if we use these functions, any data that 400

falls outside of the plotting region isn’t plotted and isn’t used in
statistical transformations. For example, when calculating the bin- 300

length
width in histograms. If you want to zoom into a plot region, then use
200
coord_cartesian(xlim = c(..,..)) instead.
100

At times, we may want to transform the data. A standard example is

the log transformation. Suppose we wanted to create a scatter plot of 0

5.0e+07 1.0e+08 1.5e+08 2.0e+08

budget
length against budget. We remove any movies that have a zero budget
Figure 3.17: Scatter plot of movie bud-
or length. Then we could use the following commands get against length.
R> h = ggplot(subset(movies, length>0 & budget>0),
+ aes(y=length)) + ylim(0, 500) 500

R> h1 = h + geom_point(aes(budget), alpha=0.2)

400

to get figure 3.17. Notice that we have changed the alpha transparency
value to help with over plotting. 300

length
To plot the log budgets, there are two possibilities. First, we could
200
transform the scale
R> h2 = h + geom_point(aes(log10(budget)), alpha=0.2) 100

to get figure 3.18. Note that ylim(0, 500) is shorthand for 0

scale_y_continuous(limits=c(0, 500)). Alternatively, we can trans- 3 4 5 6

log10(budget)
7 8

form the data: Figure 3.18: Scatter plot of movie

log10(budget) against length.
R> h3 = h1 + scale_x_log10()
R> ##Or equivalently
R> h1 + scale_x_continuous(trans="log10")
to get figure 3.19. Figures 3.18 and 3.19 are identical, but in figure 3.19 500

we are still using the original scale. To generate figure 3.19 we used
scale_x_log10() this is a convenience function of the 400

scale_x_continuous(trans=‘log10’) function. Some standard scale

300
transformations are given in table 3.4. As an aside, the scale functions
length

are fundamentally different from geom’s, since they don’t add a layer 200

to the plot.
The scale_* functions can also adjust the tick marks and labels. 100

For example,
0

R> h4 = h3 + 1e+03 1e+05 1e+07

budget

+ scale_y_continuous(breaks=seq(0,500, 100), Figure 3.19: Scatter plot of movie bud-

+ limits=c(0,500), get against length, with the budget
data transformed.
+ minor_breaks = seq(0, 500, 25),
labels=c(0, '', "", "", '', 500),
500
+
+ name="Movie Length")
gives figure 3.20. If you just want to change the x-axis limits or name,
Movie Length

then you can use the convenience functions xlim and xlab. There are
similar functions for the y-axis.

The above description of axis scales is based on what happened

in version 0.89. However, version 0.9 seems to be slightly different, but 0

1e+03 1e+05 1e+07

budget

Figure 3.20: Scatter plot of movie

budget against length. Using
scale_y_continuous gives us more
control of tick marks and grid lines.
20 dr colin s. gillespie

isn’t yet finalised. In particular, version 0.9 the default grid lines when
using a log transformation don’t appear as a regular grid.

3.8 Other topics

There are a few topics that I have skipped, mainly due to space and
time.

themes: if you want to make consistent changes to all your plots - say
reduce the font size, then you should use themes. One useful theme is
theme_bw(). This can be set globally using theme_set(theme_bw())
or using the standard notation: + theme_bw().

coordinate systems: unlike transforming data or scales, transforming

the coordinate system transforms the appearance of the geoms. For
example, a rectangle becomes a doughnut; in a map projection, the
shortest path will no longer be a straight line. See §7.3 of the ggplot2
book for further details.

Multiple plots: this includes having sub-figures on top of larger

figures or multiple plots on a single page. See §8.4 in the ggplot2
book.

Legend manipulation: changing legend titles and positions.

There is also a geom_map for plotting maps. However, I haven’t

really used this in earnest. There is also a ggmap package that might
be worth looking at.
4
A few other things 1.0

0.8

0.6

count
4.1 The dot geom 0.4

You can think of a dot plot as a one-dimensional scatter plot, where tied 0.2

values are perturbed. There are two basic algorithms for generating a
dot plot. 0.0

10 15 20 25 30
mpg

1. dot density: uses a kernel density estimation algorithm to position Figure 4.1: A dot plot of mpg using
dots. geom_dot. This is the default dotplot,
using the dotdensity method.

2. “histodot” has regular spacing between stacks. 1.0

The dots in a dot plots can be manipulated in a variety of ways: 0.8

1. The size of a dot. 0.6

count

2. Dots can be stacked in different ways – see the stackdir argument. 0.4

●
3. Altering the closeness of dots – see the stackratio argument. 0.2
● ●
●● ●●●
● ●●●●●●● ●
To create a dot plot, we use the dotplot geom: 0.0 ● ●●●●●●●●●●● ●●●
10 15 20 25 30
mpg
R> ##default dotdensity method (left)
Figure 4.2: A dot plot of mpg using
R> g = ggplot(mtcars, aes(x = mpg)) geom_dot. This plot was constructed
R> g + geom_dotplot(binwidth = 1.5) using the histodot method.

to create figure 4.1. The binwidth argument controls the number of

data points that are represented by the a single dot. The other standard 10 ●●

method for constructing dot plots is the histodot method: ●●

8
R> g + geom_dotplot(method="histodot", binwidth = 1.5) ●

●
●●

to get figure 4.2. 6 ●●

●●

●●
●

●●
tip

Dot plots are particular useful, when combined with boxplots. Using ●●
●
●●● ●
●●●
●
●●
●

●● ●●

●●
●

the tips data set again, we get

●

●
●●●
● ●
●●●

4 ●●●●●●
●
●●●● ●●

● ●●
● ●
● ●
●
●●●●●● ●●●●●● ●
●● ●
●
●●
●●●● ●
●●● ●
●● ●●
●● ●

R> h = ggplot(tips, aes(x=size, y=tip)) + ●●●●●●●●●●●●●●●●

●
●
●●●●
●●
●●●●
●●●●●●●●●

●●
●●●

●
●

●
●●●●

●
●
●
●
●

●●●●● ●
●
●

2 ●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●● ●●●● ●

+ geom_boxplot(aes(group=size)) + ●
● ●
●●●
●●
●●●
●●●
●●●●●●●●●
●●●●●

●
●●●
●●

●
●
●● ●●●

+ geom_dotplot(aes(group=size), 1 2 3 4 5 6
size
+ binaxis="y", stackdir="center", Figure 4.3: Box- and dot-plots of the
+ colour="blue", fill="blue", tips data set conditional on table size.
+ binwidth=0.05, stackratio=0.5)
to get figure 4.3.
22 dr colin s. gillespie

4.2 The error bar geom

The error bar geom provides a mechanism for adding error bars to your
plot. Three closely related geoms are:

geom_pointrange: range indicated by straight line, with a point in

the middle.

geom_linerange: range indicated by straight line.

geom_errorbarh: a horizontal geom_errobar.

The error bar geom has three required aesthetics: x, ymin and ymax.
Suppose we wanted to create a graphic where the error bars are the
mean tip size ± two standard errors. First, we calculate the some
summary statistics
R> tip_m = tapply(tips$tip, tips$size, mean)
R> tip_sd = tapply(tips$tip, tips$size, sd)
R> tip_l = tapply(tips$tip, tips$size, length)
Next we create a vector containing the standard errors multiplied by
the corresponding t statistic, i.e.

s 6

t n −1 √
n 5

to get 4

R> tip_se = qt(0.975, tip_l)* tip_sd/sqrt(tip_l) 3

Then we put this data into a data frame 2

R> df = data.frame(x = 1:6, 1

+ ymin = tip_m - tip_se, 1 2 3 4 5 6

x
+ ymax = tip_m + tip_se,
Figure 4.4: A 95% confidence interval
+ m = tip_m) for the mean tip amount, conditional
on group size.
Notice that we prepare the data before attempting to use ggplot2.
Remember, ggplot2 doesn’t try to manipulate the data, that’s up to
6

you! Now that the data is in the correct form, we can apply the errorbar
geom (figure 4.4): 5

4
R> h1 = ggplot(df) +
m

+ geom_errorbar(aes(x=x, ymin=ymin, ymax=ymax)) 3

We could go a step further and 2

R> h2 = ggplot(df) + 1

+ geom_errorbar(aes(x=x, ymin=ymin, ymax=ymax)) + 0

+ geom_bar(aes(x=x, y=m), stat="identity") 1 2 3

x
4 5 6

Figure 4.5: As figure 4.4, but with a

to to get figure 4.5.1 bar plot layer (aka dynamite plot).
1
Personally, I really dislike these plots.
See for example http://goo.gl/RvAaK
If we want to add a dot to the error bar to represent the mean or and http://goo.gl/jGTUs.
median, then we just use geom_point to create an additional layer.
advanced r graphics 23

10 ● ●

Figure 4.6: An example plot using the

●●

viewports. The top plot is spans two

8
●
columns.
●
●
●

● ● ●

6 ● ● ● ●
●
tip

● ●
●

● ● ●●●
●

●
●
●●● ● ●● ●● ●●

●●
●

●
●●●
● ●

4 ●●●
●●●●●●

●
●
●●●●

●
●●

●●
●
● ●
●
●●●●●● ●●●●●● ●
●● ●
●
●●
●●●● ●
●●● ●
●● ●●
●● ●
●●●●●●●●●●●●●●●● ●●● ●●●● ●
●
●
●
●●●● ●
●
●●
●●●●
●●●●●●●●● ● ●
●

● ●
●●
●●●●● ●
●

2
●
●●●●● ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●● ●
●
● ●
●●●
●● ●●
●●●
●●●
●●●●●●●●●
●●●●●
●
●
●●●
●
●
●● ●●●

1 2 3 4 5 6
size

6 6

5
5

4
4
m

3
3
2
2
1

1
0

1 2 3 4 5 6 1 2 3 4 5 6
x x

4.3 Multiple plots

When we want to create a figure in base graphics that contains multiple
plots, we use the par function. For example, to create a 2 × 2 plot, we
would use
R> par(mfrow=c(2, 2))
In ggplot2, we can do something similar. Using the gridExtra package,
we have
R> library(gridExtra)
R> grid.arrange(g1, g2, g3, g4, nrow=2)
where g1, g2, g3 and g4 are standard ggplot2 graph objects.
An alternative way of creating figure grids, is to use viewports. First, Using viewports gives you more flexi-
we load the grid package and create a convenience function bility, but is more complicated.

R> library(grid)
R> vplayout = function(x, y)
+ viewport(layout.pos.row = x, layout.pos.col = y)
Next we create a new page, with a 2 × 2 layout
R> grid.newpage()
R> pushViewport(viewport(layout = grid.layout(2, 2)))
Finally, we add the individual graphics. The plot created using the h
object, is placed on the first row and spans both columns:
R> print(h, vp = vplayout(1, 1:2))
The others figures are placed on the second row (figure 4.6):
R> print(h1, vp = vplayout(2, 1))
R> print(h2, vp = vplayout(2, 2))
5
Reshaping data

A common problem is that we receive data that requires restructuring.

In this chapter we will look at common ways of restructuring data to
make it more amenable to R manipulation.

5.1 The melt function

Sometimes a single variable is found in multiple columns. For example,

consider table 5.1.

Table 5.1: Some example patient data.

Patient
Patients 1–3 are columns in the table.
Gene 1 2 3
A 10.1 15.2 20.5
B 9.1 10.2 8.7
C 5.6 4.8 5.1

The patient variable is spread across three columns. After inputting

the data into R and have the following data frame
R> patient
Gene Patient1 Patient2 Patient3
1 A 10.1 15.2 20.5
2 B 9.1 10.2 8.7
3 C 5.6 4.8 5.1
To (easily) combine the patients into a single column we use the melt1 1
The melt function works with lists
function which is part of the reshape2 package. Using the melt function and arrays. You can also specify how
it should handle missing values.
is a three step process:

1. Identify the columns that we want untransformed. In this example,

we want to protect the column Gene.

2. Melt the remaining columns into a single column; use the id argu-
ment to identify the protected columns

R> pat_comb = melt(patient, id=c("Gene"))

which gives
advanced r graphics 25

R> pat_comb
Gene variable value
1 A Patient1 10.1
2 B Patient1 9.1
3 C Patient1 5.6
4 A Patient2 15.2
5 B Patient2 10.2
6 C Patient2 4.8
7 A Patient3 20.5
8 B Patient3 8.7
9 C Patient3 5.1

3. Rename the columns (if appropriate).

6
R setup

The examples in the notes are generated using the following R setup:
R> version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 2
minor 14.2
year 2012
month 02
day 29
svn rev 58522
language R
version.string R version 2.14.2 (2012-02-29)
To obtain the version number of a particular package, use
R> packageDescription("ggplot2")$Version
[1] "0.9.0"
The packages used in these notes are given in table 6.1.

Table 6.1: List of packages used in

Package Version
these notes
ggplot2 0.9.0
grid 2.14.2
gridExtra 0.9
hexbin 1.26.0
reshape2 1.2.1
scales 0.2.0
Note that version 0.9 of ggplot2 introduced a number of new features,
that weren’t available in previous versions.
Bibliography

P Murrell. R Graphics. CRC Press, 2 edition, 2011.

D Sarkar. Lattice: Multivariate Data Visualization with R (Use R!).

Springer, 1st edition, 2008.

H Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer,

New York, 2009. ISBN 978-0-387-98140-6.

L Wilkinson. The Grammar of Graphics. Springer, 1st edition, 1999.

Unit 2
No ratings yet
Unit 2
32 pages
DSR_Unit 2-2.1 ExploringBasicgraphs
No ratings yet
DSR_Unit 2-2.1 ExploringBasicgraphs
51 pages
Lecture 6 - Data Visualization With Ggplot2
No ratings yet
Lecture 6 - Data Visualization With Ggplot2
15 pages
R Module 4
No ratings yet
R Module 4
31 pages
M4 DAR Part1
No ratings yet
M4 DAR Part1
16 pages
Introduction To Ggplot2: Saier (Vivien) Ye September 16, 2013
No ratings yet
Introduction To Ggplot2: Saier (Vivien) Ye September 16, 2013
32 pages
Week10 Slides Updated
No ratings yet
Week10 Slides Updated
80 pages
Exercise 1
No ratings yet
Exercise 1
5 pages
Data Visualization Using Ggplot2
No ratings yet
Data Visualization Using Ggplot2
21 pages
Unit 5 DVTs
No ratings yet
Unit 5 DVTs
31 pages
Advanced R Programming GGPLOT2 Notes
No ratings yet
Advanced R Programming GGPLOT2 Notes
8 pages
Data Visualization With R Ggplot2
No ratings yet
Data Visualization With R Ggplot2
236 pages
BDA Experiment 9 and 10
No ratings yet
BDA Experiment 9 and 10
22 pages
02 Visualize Slides
No ratings yet
02 Visualize Slides
92 pages
Ex4
No ratings yet
Ex4
4 pages
DS-R Block 4 All
No ratings yet
DS-R Block 4 All
50 pages
04 Visualizing Data
No ratings yet
04 Visualizing Data
145 pages
2 Table and Graphical Representations
No ratings yet
2 Table and Graphical Representations
46 pages
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
No ratings yet
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
63 pages
3 Styling Ggplot2 Graphics
No ratings yet
3 Styling Ggplot2 Graphics
38 pages
Data Layers Niveditha Haridas 2302032
No ratings yet
Data Layers Niveditha Haridas 2302032
18 pages
data visualization.R
No ratings yet
data visualization.R
12 pages
Graphics Chapter
No ratings yet
Graphics Chapter
49 pages
Introduction to Data Analysis Using R 35 Min Lecture
No ratings yet
Introduction to Data Analysis Using R 35 Min Lecture
17 pages
Ggplot2 Essentials - Sample Chapter
No ratings yet
Ggplot2 Essentials - Sample Chapter
52 pages
R Packages
No ratings yet
R Packages
6 pages
Creating Data Visualizations using ggplot
No ratings yet
Creating Data Visualizations using ggplot
2 pages
R Module 4
No ratings yet
R Module 4
42 pages
R Graphics1
No ratings yet
R Graphics1
56 pages
Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages 1st Edition Margot Tollefson All Chapters Instant Download
100% (1)
Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages 1st Edition Margot Tollefson All Chapters Instant Download
40 pages
(Ebook) Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages by Margot Tollefson ISBN 9781484268308, 148426830X - The full ebook version is just one click away
100% (1)
(Ebook) Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages by Margot Tollefson ISBN 9781484268308, 148426830X - The full ebook version is just one click away
81 pages
IDS Unit-5
No ratings yet
IDS Unit-5
39 pages
22MSM40206 Data Visualisation
No ratings yet
22MSM40206 Data Visualisation
13 pages
Data Visualization With Ggplot2, Asthetic Mappings, Facets, Common Problems, Layered Grammar of Graphics
No ratings yet
Data Visualization With Ggplot2, Asthetic Mappings, Facets, Common Problems, Layered Grammar of Graphics
21 pages
Figures With GGPlot
No ratings yet
Figures With GGPlot
58 pages
Business Analytics Unit - IV Notes_60637706_2025_05!15!02_16
No ratings yet
Business Analytics Unit - IV Notes_60637706_2025_05!15!02_16
28 pages
Creating Graphs in R: Stats 4590: Lab #2 Graphics and Printing R Output Jan. 25, 2010
No ratings yet
Creating Graphs in R: Stats 4590: Lab #2 Graphics and Printing R Output Jan. 25, 2010
7 pages
Producing Simple Graphs With R
No ratings yet
Producing Simple Graphs With R
9 pages
228
No ratings yet
228
2 pages
Data Visualization in R Sem-III 2021 PDF
No ratings yet
Data Visualization in R Sem-III 2021 PDF
57 pages
R Graphics, Third Edition_previewpdf
No ratings yet
R Graphics, Third Edition_previewpdf
43 pages
R Data Visualization
No ratings yet
R Data Visualization
79 pages
Useful R Packages
No ratings yet
Useful R Packages
73 pages
Ifw Deep Dive R-quick Guide
No ratings yet
Ifw Deep Dive R-quick Guide
12 pages
Lab01 Note R
No ratings yet
Lab01 Note R
7 pages
Tutorial Letter 103/0/2015: Research Support Tools DSC4810 Year Module
No ratings yet
Tutorial Letter 103/0/2015: Research Support Tools DSC4810 Year Module
9 pages
Module_4
No ratings yet
Module_4
23 pages
246
No ratings yet
246
2 pages
Data Visualization in R - With Cheat Sheets PDF
100% (1)
Data Visualization in R - With Cheat Sheets PDF
62 pages
Ggplot
No ratings yet
Ggplot
10 pages
R Programming Unit 3
No ratings yet
R Programming Unit 3
48 pages
226
No ratings yet
226
2 pages
05 Charts and Graphs in R
No ratings yet
05 Charts and Graphs in R
51 pages
Exploratory_Data_Analysis_Course_Notes
No ratings yet
Exploratory_Data_Analysis_Course_Notes
55 pages
r Programming Unit-3
No ratings yet
r Programming Unit-3
76 pages
Download full Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages 1st Edition Margot Tollefson ebook all chapters
100% (3)
Download full Visualizing Data in R 4: Graphics Using the base, graphics, stats, and ggplot2 Packages 1st Edition Margot Tollefson ebook all chapters
22 pages
BAB 5-2 MTK Graph in R PT 2 Materi Line Plot
No ratings yet
BAB 5-2 MTK Graph in R PT 2 Materi Line Plot
9 pages
235
No ratings yet
235
2 pages
Data Visualization With Ggplot2::: Cheat Sheet
No ratings yet
Data Visualization With Ggplot2::: Cheat Sheet
2 pages
Game Boy / Color Architecture: Architecture of Consoles: A Practical Analysis, #2
From Everand
Game Boy / Color Architecture: Architecture of Consoles: A Practical Analysis, #2
Rodrigo Copetti
No ratings yet
KYOCERA Cluster Printing v1.1 Software Information
No ratings yet
KYOCERA Cluster Printing v1.1 Software Information
25 pages
Quick Reference: Key Dates
No ratings yet
Quick Reference: Key Dates
21 pages
DB2 - IBM's Relational DBMS
No ratings yet
DB2 - IBM's Relational DBMS
157 pages
Trade Receivables Discounting System - M1 Exchange
100% (1)
Trade Receivables Discounting System - M1 Exchange
14 pages
09 Incorta Administration
No ratings yet
09 Incorta Administration
59 pages
Getting Started AccuNest
No ratings yet
Getting Started AccuNest
94 pages
Or Ques-Category 202303
No ratings yet
Or Ques-Category 202303
23 pages
Python For Programmers - A Project-Based Tutorial
No ratings yet
Python For Programmers - A Project-Based Tutorial
131 pages
11.1. Data logger (-50 to +70)_Traceable_Manual
No ratings yet
11.1. Data logger (-50 to +70)_Traceable_Manual
2 pages
ABAC Formula 5.5 22kW
100% (1)
ABAC Formula 5.5 22kW
6 pages
Installation Manual: English
No ratings yet
Installation Manual: English
24 pages
SYSTEM DESIGN (Basic Modules)
No ratings yet
SYSTEM DESIGN (Basic Modules)
2 pages
Subconscious Programming
100% (4)
Subconscious Programming
51 pages
ELE4202 Lecture 8
No ratings yet
ELE4202 Lecture 8
15 pages
LG SM8600 Review (49SM8600, 55SM8600, 65SM8600, 75SM8670)
No ratings yet
LG SM8600 Review (49SM8600, 55SM8600, 65SM8600, 75SM8670)
1 page
ITC-InT-AR Middleware Enrichment 1.4
No ratings yet
ITC-InT-AR Middleware Enrichment 1.4
49 pages
Reference Qualifiers
No ratings yet
Reference Qualifiers
7 pages
2021 Procurement Key Issues: ALL Spend, ALL Suppliers, NO Compromises
No ratings yet
2021 Procurement Key Issues: ALL Spend, ALL Suppliers, NO Compromises
19 pages
Iu 22
100% (2)
Iu 22
225 pages
Chapter 1 and 2
No ratings yet
Chapter 1 and 2
15 pages
Modular System Programming in Minix-3
No ratings yet
Modular System Programming in Minix-3
10 pages
GEO129 Analysis 2
No ratings yet
GEO129 Analysis 2
5 pages
Non-Disclosure Agreement (NDA)
No ratings yet
Non-Disclosure Agreement (NDA)
4 pages
ProMax Error Solutions
No ratings yet
ProMax Error Solutions
2 pages
Project on visitors management
No ratings yet
Project on visitors management
39 pages
X RAY From Tony
No ratings yet
X RAY From Tony
4 pages
Topic Wise Java Questions
No ratings yet
Topic Wise Java Questions
9 pages
A Compensation-Based Optimization Methodology For Gain-Boosted Opamp
No ratings yet
A Compensation-Based Optimization Methodology For Gain-Boosted Opamp
4 pages
Extracting Tabular Data From Pdfs With Camelot and Excalibur
No ratings yet
Extracting Tabular Data From Pdfs With Camelot and Excalibur
13 pages
Wide Area Network
No ratings yet
Wide Area Network
12 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

03.Graphics in R

Uploaded by

03.Graphics in R

Uploaded by

DR COLIN S.

4 A few other things 21

1.1 Installing packages

Installing packages in R is straightforward. To install a package from the

1.2 Types of R graphics

1.2.1 Base graphics

1.2.2 Grid graphics

The viewports system makes it easier to construct complex plots. Grid

1.2.3 Lattice graphics

1.2.4 ggplot2 graphics

1.3 Data sets

1.3.1 Fuel economy data

1.3.2 The tips data set

 gender of the bill payer,

 whether there were smokers in the party,

 day of the week6 6

Table 1.2: The first five rows of the

10.34 1.66 Male No Sun Dinner 3

1.3.3 Movie data set

 Title. Title of the movie.

 Year. Year of release.

 Budget. Total budget in US dollars. If the budget isn’t known, then

 Length. Length in minutes.

 Rating. Average IMDB user rating.

 Votes. Number of IMDB users who rated this movie.

 r2 – r10: Similar to r1.

 mpaa. The MPAA rating - PG, PG-13, R, NC-17.

 Action, Animation, Comedy, Drama, Documentary, Romance, Short.

This data set is part of the ggplot2 package:

Voting statistics Movie genre

Table 1.3: Sample rows of the movie

ggplot2 is a bit different from other graphics packages. It roughly

2.1 A basic plot using base graphics

R> points(mpg[mpg$cyl==5,]$displ, mpg[mpg$cyl==5,]$cty, ●

R> points(mpg[mpg$cyl==6,]$displ, mpg[mpg$cyl==6,]$cty, ● ●

R> points(mpg[mpg$cyl==8,]$displ, mpg[mpg$cyl==8,]$cty, ● ●● ● ● ● ● ● ● ●

This would produce figure 2.1. A few points to note:

and ylim. Figure 2.1: A scatter plot of engine

 The default axis labels are terrible - mpg[mpg$cyl==4,]$displ

 If we wanted to look at highway miles per gallon, this is a bit of a

Plot Name Geom Base graphic

R> g = ggplot(data=mpg, aes(x=displ, y=cty)) ●

The ggplot2 code is fundamentally different from the base code. 25 ● ●

graphic. A particular aesthetic can be mapped to a variable or set to ● ● ●● ●

a constant value. In figure 2.2, the variable displ is mapped to the 10

x-axis and cty variable is mapped to the y-axis. 2 3 4

these additional aesthetics are left at their default value. ● ●

of the points depends on an additional factor: ●

R> p = g + geom_line( Figure 2.3: As figure 2.2, but where

plots, they just illustrate the general idea. 30

Points, bars and lines are all examples of geom’s or geometric

objects. Typically, if we use a single geom, we get a standard plot. 4

graphic counter part. 15

However using the idea of a graphical grammar, we can construct

produces figure 2.5, which doesn’t really have a simple name. 35 ●

In each ggplot2 command, we are adding (multiple) layers. A single ●

layer comprises of four elements: ● ●

 an aesthetic and data mapping;

 a statistical transformation (stat); 10

 a geometric object (geom); displ

Figure 2.5: As figure 2.2, but with loess

 and a position adjustment, i.e. how should objects that overlap be

When we use the command

 geom_* which creates a layer with a specific geom (and various

3.1 The basic plot object

3.2 Geometric objects

Table 3.1: A few standard geom’s in

jitter Individual points are jittered to avoid overlap ●

step Connect observations by stairs ●

R> g = ggplot(tips, aes(x=size, y=tip))

create a boxplot using the boxplot geom:

boxplots conditional on table size 8

Many geom’s have this aesthetic. For example, if we used geom_line, 4

We are not restricted to a single geom - we can add multiple geoms.

Figure 3.2: A boxplots of tips, condi-

R> ##We colour the points depending on whether the

gender of the bill payer,

whether there were smokers in the party,

day of the week6 6

Title. Title of the movie.

Year. Year of release.

Budget. Total budget in US dollars. If the budget isn’t known, then

Length. Length in minutes.

Rating. Average IMDB user rating.

Votes. Number of IMDB users who rated this movie.

r2 – r10: Similar to r1.

mpaa. The MPAA rating - PG, PG-13, R, NC-17.

Action, Animation, Comedy, Drama, Documentary, Romance, Short.

The default axis labels are terrible - mpg[mpg$cyl==4,]$displ

If we wanted to look at highway miles per gallon, this is a bit of a

an aesthetic and data mapping;

a statistical transformation (stat); 10

a geometric object (geom); displ

and a position adjustment, i.e. how should objects that overlap be

geom_* which creates a layer with a specific geom (and various

geom_boxplot: produces a boxplot - see figure 3.1. 30000

geom_point: a scatter plot - see figure 3.3. 20000

geom_bar: produces a standard barplot that counts the x values. 10000

geom_line: a line plot - see practical 3.

geom_text: adds labels to specified points. This has an additional

ymin: bottom (vertical minimum)

ymax: top (vertical maximum).

facet_grid: produces a 2d panel of plots where variables define

facet_wrap: produces a 1d ribbon of panels which can be wrapped 5000

y ∼ .: a single column with multiple rows. This can be handy

. ∼ x: a single row with multiple columns. Very useful in wide

y ∼ x: multiple rows and columns. Typically the variable with the

scales = ‘free’: x and y scales vary across all panels.

scales = ‘free_x’: the x scale is free.

scales = ‘free_y: the y scale is free.