R Workshop
R Workshop
R Workshop
- a quick start -
September 2004
Contents
1 An Introduction to R 3
1.1 Downloading and Installing R . . . . . . . . . . . . . . . . . . . . . 3
1.2 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Statistical Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Writing Custom R Functions . . . . . . . . . . . . . . . . . . . . . . . 10
2 Linear Models 12
2.1 Fitting Linear Models in R . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Advanced Graphics 36
4.1 Customizing Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Mathematical Annotations . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Three-Dimensional Plots . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 RGL: 3D Visualization in R using OpenGL . . . . . . . . . . . . . . 43
A Rfunctions 44
A.1 Mathematical Expressions (expression()) . . . . . . . . . . . . . 44
A.2 The RGL Functionset . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1
Preface
The notes comprise four sections, which build on each other and should there-
fore be read sequentially. The first section (An Introduction to R) introduces the
most basic concepts. Occasionally things are simplified and restricted to the min-
imum background in order to avoid obscuring the main ideas by offering too
much detail. The second and the third section (Linear Models and Time Series
Analysis) illustrate some standard R commands pertaining to these two common
statistical topics. The fourth section (Advanced Graphics) covers some of the ex-
cellent graphical capabilities of the package.
Throughout the text typewriter font is used for annotating R functions and
options. R functions are given with brackets, e.g. plot() while options are
typed in italic typewriter font, e.g. xlab="x label". R commands which are
entered by the user are printed in red and the output from R is printed in blue.
The datasets used are available from the URI http://134.76.173.220/R workshop.
2
Chapter 1
An Introduction to R
2+3
3
1.2. G ETTING S TARTED 4
into the console, R adds 3 to 2 and displays the result. Other simple operators
include
2-3 # Subtraction
2*3 # Multiplication
2/3 # Division
23 # 23
sqrt(3) # Square roots
log(3) # Logarithms (to the base e)
(2 - 3) * 3
test <- 2 * 3
performs the operation on the right hand side (2*3) and then stores the result
as an object named test. (One can also use = or even -> for assignments.) Fur-
ther operations can be carried out on objects, e.g.
2 * test
multiplies the value stored in test with 2. Note that objects are overwritten
without notice. The command ls() outputs the list of currently defined objects. ls()
Data types
As in other programming languages, there are different data types available in R,
namely numeric, character and logical. As the name indicates, numeric
is used for numerical values (double precision). The type character is used for
characters and is generally entered using quotation marks:
Object types
Depending on the structure of the data, R recognises 4 standard object types:
vectors, matrices, data frames and lists. Vectors are onedimensional ar-
rays of data; matrices are twodimensional data arrays. Data frames and lists are
further generalizations and will be covered in a later section.
Creating vectors in R
There are various means of creating vectors in R. E.g. in case one wants to save
the numbers 3, 5, 6, 7, 1 as mynumbers, one can use the c() command: c()
Further operations can then be carried out on the R object mynumbers. Note
that arithmetic operations on vectors (and matrices) are carried out component
wise, e.g. mynumbers*mynumbers returns the squared value of each component
of mynumbers.
Sequences can be created using either : or seq(): :
1:10
creates a vector containing the numbers 1, 2, 3, . . . , 10. The seq() command al- seq()
lows the increments of the sequence to be specified:
creates a vector containing the numbers 0.5, 1, 1.5, 2, 2.5. Alternatively one can
specify the length of the sequence:
creates a sequence from 0.5 to 2.5 with the increments chosen such that the re-
sulting sequence contains 100 equally spaced values.
Creating matrices in R
One way of creating a matrix in R is to convert a vector of length n m into a n m
matrix:
Note that the matrix is created columnwise for rowwise construction one has to
use the option byrow=TRUE:
3. Plot x against y
Graphs of functions are created in essentially the same way in R, e.g. plotting the
function f (x) = sin(x) in the range of to can be done as follows:
1.0
0.5
0.5
0.0
0.0
y
0.5
0.5
1.0
1.0
3 2 1 0 1 2 3 3 2 1 0 1 2 3
x x
The output is shown in the left part of figure 1.1. However, the graph does not
look very appealing since it lacks smoothness. A simple trick for improving
the graph is to simply increase the number of xvalues at which f (x) is evalu-
ated, e.g. to 1000:
The result is shown in the right part of figure 1.1. Note the use of the option
type="l", which causes the graph to be drawn with connecting lines rather than
points.
1.3. S TATISTICAL D ISTRIBUTIONS 8
E.g., random numbers (r) from the normal distribution (norm) can be drawn us-
ing the rnorm() function; quantiles (q) of the 2 distribution (chisq) are ob-
tained with qchisq().
The following examples illustrate the use of the R functions for computations in-
volving statistical distributions:
Histogram of mysample
0.4
0.3
Density
0.2
0.1
0.0
3 2 1 0 1 2 3
mysample
Figure 1.2: Histogram of normally distributed random numbers and fitted den-
sity.
Another example (figure 1.3) is the visualization of the approximation of the bi-
nomial distribution with the normal distribution for e.g. n = 50 and = 0.25:
0.04
0.00
0 10 20 30 40 50
Figure 1.3: Comparing the binomial distribution with n = 50 pand = 0.25 with
an approximation by the normal distribution ( = n , = n (1 )).
i=1
Since R doesnt have a function for computing the geometric mean, we have to
write our own function geo.mean():
fix(geo.mean) fix()
function(x) function()
{
n <- length(x)
gm <- exp(mean(log(x)))
return(gm)
}
Note that R checks the function after closing and saving the editorwindow. In
case of structural errors (the most common case for that are missing brackets),
R reports these to the user. In order to fix the error(s), one has to enter
since the (erroneous) results are not saved. (Using fix(geo.mean) results in
loosing the last changes.)
Chapter 2
Linear Models
The command read.table() reads a file into R assuming that the data is struc-
tured as a matrix (table). It assumes that the entries of a row are separated by
blank spaces (or any other suitable separator) and the rows are separated by line
feeds. The option header=TRUE tells R that the first row is used for labelling the
columns.
In order to get an overview over the relation between the 4 (quantitative) vari-
ables, one can use
pairs(strength) pairs()
gripi = 0 + 1 armi + ei
12
2.1. F ITTING L INEAR M ODELS IN R 13
The function lm() returns a list object which we have saved under some name,
e.g. as fit. As previously mentioned, lists are a generalized objecttype; a list
can contain several objects of different types and modes arranged into a single
object. The names of the entries stored in a list can be viewed using
names(fit) names()
One entry in this list is coefficients which contains the coefficients of the
fitted model. The coefficients can be accessed using the $sign:
fit$coefficients
returns a vector (in this case of length 2) containing the estimated parameters
(0 and 1 ). Another entry is residuals, which contains the residuals of the
fitted model:
Before looking further at our fitted model, let us briefly examine the residuals.
A first insight is given by displaying the residuals as a histogram:
The function density() computes the kernel density estimate (other methods density()
for kernel density estimation will be discussed in a later section).
Here one might also wish to add the pdf of the normal distribution to the graph:
mu <- mean(res)
sigma <- sd(res)
x <- seq(-60, 60, length = 500)
y <- dnorm(x, mu, sigma)
lines(x, y, col = 6)
2.1. F ITTING L INEAR M ODELS IN R 14
Histogram of res
0.020
0.015
Density
0.010
0.005
0.000
40 20 0 20 40 60
res
Figure 2.1: Histogram of the model residuals with kernel density estimate and
fitted normal distribution.
There also exist alternative ways for (graphically) investigating the normality of
a sample, for example QQ-plots:
qqnorm(res) qqnorm()
Another option is to specifically test for normality, e.g. using the Kolmogorov-
Smirnov test or the Shapiro-Wilks test:
performs the Kolmogorov-Smirnov test on res. Since this test can be used for
any distribution, one has to specify the distribution (pnorm) and its parameters
(mu and sigma). The ShapiroWilks test specifically tests for normality, so one
only has to specify the data:
shapiro.test(res) shapiro.test()
2.1. F ITTING L INEAR M ODELS IN R 15
Normal QQ Plot
40
20
Sample Quantiles
0
20
40
2 1 0 1 2
Theoretical Quantiles
Now back to our fitted model. In order to display the observations together with
the fitted model, one can use the following code which creates the graph shown
in figure 2.3:
plot(strength[,2], strength[,1])
betahat <- fit$coefficients
x <- seq(0, 200, length = 500)
y <- betahat[1] + betahat[2]*x
lines(x, y, col = "red")
summary(fit)
returns an output containing the values of the coefficients and other information:
2.1. F ITTING L INEAR M ODELS IN R 16
150
strength[, 1]
100
50
20 40 60 80 100 120
strength[, 2]
Call:
lm(formula = strength[, 1] strength[, 2])
Residuals:
Min 1Q Median 3Q Max
-49.0034 -11.5574 0.4104 12.3367 51.0541
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 54.70811 5.88572 9.295 <2e-16 ***
strength[, 2] 0.70504 0.07221 9.764 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Analysis of Variance
Consider the dataset miete.dat, which contains rent prices for apartments in a
German city. Two factors were also recorded: year of construction, and whether
the apartment was on the ground floor, first floor, ..., fourth floor. Again, the data
can be imported into R using the read.table() command:
In this case, rent is a matrix comprising three columns: The first one (Bau-
jahr) indicates the year of construction, the second one (Lage) indicates the
floor and the third column contains the rent prices (Miete) per a square meter.
We can start by translating the German labels into English:
names(rent) names()
returns the names of the rent objects. The names can be changed by typing
Here we have saved the third column of rent as price and the second one as
fl. The command levels(fl) shows us the levels of fl (a to e). It is pos-
sible to perform queries using square brackets, e.g.
price[fl=="a"]
returns the prices for the apartments on floor a (ground floor in this case). Ac-
cordingly,
fl[price<7]
returns the floor levels whose corresponding rent prices (per m2 ) are less than
7 (Euro). These queries can be further expanded using logical AND (&) or OR (|)
operators:
returns all floor levels whose corresponding rent prices are between 5 and 7
(Euro).
2.1. F ITTING L INEAR M ODELS IN R 18
A convenient function is split(a,b), which splits the data a by the levels given
in b. This can be used together with the function boxplot():
split()
Accordingly, the relation between the year of construction and the price can be
visualized with
12
10
10
8
8
6
6
4
4
2
2
B0048
B4960
B6169
B7079
B8089
a b c d e
Figure 2.4: Boxplots of the rent example. The left boxplot displays the relation
between price and floor; the right boxplot shows the relation between price and
year.
The analysis of variance can be carried out in two ways, either by treating it as a
linear model (lm()) or by using the function aov(), which is more convenient
in this case:
returns
2.1. F ITTING L INEAR M ODELS IN R 19
Call:
lm(formula = price fl)
Residuals:
Min 1Q Median 3Q Max
-4.4132 -1.2834 -0.1463 1.1717 6.2987
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.8593 0.1858 36.925 <2e-16 ***
flb 0.0720 0.2627 0.274 0.784
flc -0.2061 0.2627 -0.785 0.433
fld 0.0564 0.2627 0.215 0.830
fle -0.1197 0.2627 -0.456 0.649
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.858 on 495 degrees of freedom
Multiple R-Squared: 0.003348, Adjusted R-squared: -0.004706
F-statistic: 0.4157 on 4 and 495 DF, p-value: 0.7974
returns
Df Sum Sq Mean Sq F value Pr(>F)
fl 4 5.74 1.43 0.4157 0.7974
Residuals 495 1708.14 3.45
The full model (i.e. including year and floor as well as interactions) is analysed
with
Analysis of covariance
The extension to the analysis of covariance is straightforward. The dataset car is
based on data provided by the U.S. Environmental Protection Agency (82 cases).
It contains the following variables:
BRAND Car manufacturer
HP Engine horsepower
attach(car) attach()
or aov(). The distribution of the response needs to be specified, as does the link
function, which expresses the monotone function of the conditional expectation
of the response variable that is to be modelled as a linear combination of the co-
variates.
In order to obtain help for an R-function, one can use the builtin helpsystem of
R:
?glm or help(glm) ?
help()
Typically, the helpdocument contains information on the structure of the func-
tion, an explanation of the arguments, references, examples etc. In case of glm(),
there are several examples given. The examples can be examined by copying the
code and pasting it into the R console. For generalized linear models, the infor-
mation retrieved by
?family family
The first three lines are used to create an R object with the data. The fourth line
(print()) displays the created data; the fitting is done in the fifth line with
glm(). The last two lines (anova() and summary()) are used for displaying
the results.
2.3 Extensions
An important feature of R is its extension system. Extensions for R are delivered
as packages (libraries), which can be loaded within R using the library()
command. Usually, packages contain functions, datasets, help files and other library()
files such as dlls (Further information on creating custom packages for R can be
found on the R website).
There exist a number of packages that offer extensions for linear models. The
package mgcv contains functions for fitting generalized additive models (gam());
routines for nonparametric density estimation and nonparametric regression are
2.3. E XTENSIONS 22
For example, fitting a GAM to the car dataset can be carried out as follows:
library(mgcv)
fit <- gam(MPGs(SP)) gam()
summary(fit)
The first line loads the package mgcv which contains the function gam(). In
the second line the variable MPG (Miles per Gallon) was modelled as a smooth
function of SP (Speed). Note that the structure of GAM formulae is almost iden-
tical to the standard ones in R the only difference is the use of s() for indi-
cating smooth functions. A summary of the fitted model is again given by the
summary() command.
Plotting the observations and the fitted model as shown in figure 2.5 can be done
in the following way:
plot(HP, MPG)
x <- seq(0, 350, length = 500)
y <- predict(fit, data.frame(HP = x))
lines(x, y, col = "red", lwd = 2)
In this case, the (generic) function predict() was used for predicting (i.e. predict()
obtaining y at the specified values of the covariate, here x).
60
50
MPG
40
30
20
HP
A simple class of linear filters are moving averages with equal weights:
Xa
1
Tt = Xt+i
2a + 1 i=a
In this case, the filtered value of a time series at a given period is represented by
the average of the values {x a , . . . , x , . . . , x +a }. The coefficients of the filtering
1 1
are { 2a+1 , . . . , 2a+1 }.
Consider the dataset tui, which contains stock data for the TUI AG from Jan., 3rd
2000 to May, 14th 2002, namely date (1st column), opening values (2nd column),
highest and lowest values (3rd and 4th column), closing values (5th column) and
c
trading volumes (6th column). The dataset has been exported from Excel as a
CSVfile (comma separated values). CSVfiles can be imported into R with the
function read.csv(): read.csv()
The option dec specifies the decimal separator (in this case, a comma has been
used as a decimal separator. This option is not needed when a dot is used as a
23
3.1. C LASSICAL D ECOMPOSITION 24
decimal separator.) The option sep specifies the separator used to separate en-
tries of the rows.
Applying simple moving averages with a = 2, 12, and 40 to the closing values of
the tui dataset implies using following filters:
a = 2 : i = { 15 , 51 , 15 , 15 , 15 }
1 1
a = 12 : i = { , . . . , }
| 25 {z 25 }
25 times
1 1
a = 40 : i = { , . . . , }
| 81 {z 81 }
81 times
The resulting filtered values are (approximately) weekly (a = 2), monthly (a = 12)
and quarterly (a = 40) averages of returns. Filtering is carried out in R with the
filter() command.
filter()
50
40
tui[, 4]
30
20
Index
The following code was used to create figure 3.1 which plots the closing values
of the TUI shares and the averages, displayed in different colours.
plot(tui[,5], type = "l")
tui.1 <- filter(tui[,5], filter = rep(1/5, 5))
3.1. C LASSICAL D ECOMPOSITION 25
The data is read from C:/R workshop/beer.csv and then transformed with
ts() into a ts object. This transformation is required for most of the time ts()
series functions, since a time series contains more information than the values
itself, namely information about dates and frequencies at which the time series
has been recorded.
3.1. C LASSICAL D ECOMPOSITION 26
5.4
5.0
data
4.6
4.2
0.2
seasonal
0.0
0.2
4.5 4.7 4.9 5.1
trend
remainder
0.0
0.2
time
Regression analysis
R offers the functions lsfit() (least squares fit) and lm() (linear models, a lsfit()
more general function) for regression analysis. This section focuses on lm(),
lm()
since it offers more features, especially when it comes to testing significance of
the coefficients.
Consider again the beer data. Assume that we want to fit the following model
(a parabola) to the logs of beer: log(Xt ) = 0 + 1 t + 2 t2 + et
The fitting can be carried out in R with the following commands:
5.4
5.0
lbeer
4.6
4.2
Time
In the first command above, logs of beer are computed and stored as lbeer.
Explanatory variables (t and t2 as t and t2) are defined in the second and third
row. The actual fit of the model is done using lm(lbeert+t2). The func-
tion lm() returns a list object, whose element can be accessed using the $
sign: lm(lbiert+t2)$coefficients returns the estimated coefficients (0 ,
1 and 2 ); lm(lbiert+t2)$fit returns the fitted values Xt of the model.
Extending the model to
2 2t 2t
log(Xt ) = 0 + 1 t + 2 t + cos + sin + et
12 12
so that it includes the first Fourier frequency is straightforward. After defining
the two additional explanatory variables, cos.t and sin.t, the model can be
estimated in the usual way:
lbeer <- log(beer)
t <- seq(1956, 1995 + 7/12, length = length(lbeer))
t2 <- t2
sin.t <- sin(2*pi*t)
cos.t <- cos(2*pi*t)
plot(lbeer)
lines(lm(lbeert+t2+sin.t+cos.t)$fit, col = 4)
Note that in this case sin.t and cos.t do not include 12 in the denominator,
1
since 12 has already been considered during the transformation of beer and the
3.1. C LASSICAL D ECOMPOSITION 28
5.4
5.0
lbeer
4.6
4.2
Time
Figure 3.4: Fitting a parabola and the first fourier frequency to lbeer.
construction of t.
summary(lm(lbeert+t2+sin.t+cos.t))
Residuals:
Min 1Q Median 3Q Max
-0.2722753 -0.0686953 -0.0006432 0.0695916 0.2370383
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.734e+03 1.474e+02 -25.330 < 2e-16 ***
t 3.768e+00 1.492e-01 25.250 < 2e-16 ***
t2 -9.493e-04 3.777e-05 -25.137 < 2e-16 ***
sin.t -4.870e-02 6.297e-03 -7.735 6.34e-14 ***
cos.t 1.361e-01 6.283e-03 21.655 < 2e-16 ***
---
3.2. E XPONENTIAL S MOOTHING 29
Apart from the coefficient estimates and their standard error, the output also in-
cludes the corresponding t-statistics and pvalues. In our case, the coefficients 0
(Intercept), 1 (t), 2 (t2 ) and (sin(t)) differ significantly from zero, while does
not seem to. (One might include anyway, since Fourier frequencies are usually
taken in pairs of sine and cosine.)
xn (1) = 0 xn + 1 xn1 + . . .
In its basic form exponential smoothing is applicable to time series with no sys-
tematic trend and/or seasonal components. It has been generalized to the Holt
Wintersprocedure in order to deal with time series containing trend and sea-
sonal variation. In this case, three smoothing parameters are required, namely
(for the level), (for the trend) and (for the seasonal variation).
Thus, the exponential smoothing of the beer dataset can be performed as fol-
lows:
3.2. E XPONENTIAL S MOOTHING 30
100
Time
R offers the function predict(), which is a generic function for predictions from
various models. In order to use predict(), one has to save the fit of a model
to an object, e.g.:
In this case, we have saved the fit from the HoltWinters procedure on beer as
beer.hw.
3.3. ARIMAM ODELS 31
returns the predicted values for the next 12 periods (i.e. Sep. 1995 to Aug. 1996).
The following commands can be used to create a graph with the predictions for
the next 4 years (i.e. 48 months):
plot(beer, xlim=c(1956, 1999))
lines(predict(beer.hw, n.ahead = 48), col = 2)
200
150
beer
100
Time
3.3 ARIMAModels
Introductory Remarks
Forecasting based on ARIMA (autoregressive integrated moving average) mod-
els, sometimes referred to as the BoxJenkins approach, comprises following stages:
i.) Model identification
ii.) Parameter estimation
iii.) Diagnostic checking
These stages are repeated iteratively until a satisfactory model for the given data
has been identified (e.g. for prediction). The following three sections show some
facilities that R offers for carrying out these three stages.
3.3. ARIMAM ODELS 32
ACF
0.2
0 5 10 20 30 0 5 10 20 30
Lag Lag
Partial ACF
0.6
0.4 0.1
0.0
0 5 10 20 30 0 5 10 20 30
Lag Lag
The last four lines create the ACF and PACF plots of the two simulated processes.
Note that by default the plots include confidence intervals (based on uncorrelated
series).
data(LakeHuron) data()
fit <- arima(LakeHuron, order = c(1, 0, 1))
In this case fit is a list containing e.g. the coefficients (fit$coef), residuals
(fit$residuals) and the Akaike Information Criterion AIC (fit$aic).
Diagnostic Checking
A first step in diagnostic checking of fitted models is to analyse the residuals
from the fit for any signs of nonrandomness. R has the function tsdiag(), tsdiag()
which produces a diagnostic plot of a fitted time series model:
It produces the output shown in figure 3.8: A plot of the residuals, the auto-
correlation of the residuals and the p-values of the LjungBox statistic for the first
10 lags.
The BoxPierce (and LjungBox) test examines the Null of independently dis-
tributed residuals. Its derived from the idea that the residuals of a correctly
specified model are independently distributed. If the residuals are not, then
they come from a missspecified model. The function Box.test() computes Box.test()
the test statistic for a given lag:
Box.test(fit$residuals, lag = 1)
Prediction of ARIMAModels
Once a model has been identified and its parameters have been estimated, one
can predict future values of a time series. Lets assume that we are satisfied with
3.3. ARIMAM ODELS 34
Standardized Residuals
1
2
Time
ACF of Residuals
0.2 0.6
ACF
0 5 10 15
Lag
0.0 0.6
2 4 6 8 10
lag
As with Exponential Smoothing, the function predict() can be used for pre- predict()
dicting future values of the levels under the model:
Here we have predicted the levels of Lake Huron for the next 8 years (i.e. until
1980). In this case, LH.pred is a list containing two entries, the predicted values
LH.pred$pred and the standard errors of the prediction LH.pred$se. Using
the familiar rule of thumb for an approximate confidence interval (95%) for the
prediction, prediction 2SE, one can plot the Lake Huron data, the predicted
values and the corresponding approximate confidence intervals:
plot(LakeHuron, xlim = c(1875, 1980), ylim = c(575, 584))
LH.pred <- predict(fit, n.ahead = 8)
3.3. ARIMAM ODELS 35
First, the levels of Lake Huron are plotted. In order to leave some space for
adding the predicted values, the x-axis has been set to the interval 1875 to 1980
using the optional argument xlim=c(1875,1980); the use of ylim below is
purely for visual enhancement. The prediction takes place in the second line us-
ing predict() on the fitted model. Adding the prediction and the approximate
confidence interval is done in the last three lines. The confidence bands are drawn
as a red, dotted line (using the options col="red" and lty=3):
576 578 580 582 584
LakeHuron
Time
Advanced Graphics
Assume that we want to have a custom title and different labels for the axes as
shown in figure 4.1. The relevant options are main (for the title), xlab and ylab main
(axes labels): xlab
The title and the labels are entered as characters, i.e. in quotation marks. To
include quotation marks in the title itself, a backslash is required before each
quotation mark: \". The backslash is also used for some other commands, such
as line breaks. Using \n results in a line feed, e.g.
within a plot command writes first part in the first line of the title and sec-
ond part in the second line.
36
4.1. C USTOMIZING P LOTS 37
custom title
0.3
0.2
y label
0.1
0.0
3 2 1 0 1 2 3 4
x label
Figure 4.1: Customizing the main title and the axes labels using main, xlab and
ylab.
6 : Times font
The text size can be controlled using the cex option (character expansion). cex
Again, the cex option also has subcategories such as cex.axis , cex.lab ,
cex.axis
cex.main and cex.sub . The size of text is specified using a relative value (e.g.
cex=1 doesnt change the size, cex=0.8 reduces the size to 80% and cex=1.2 cex.lab
enlarges the size to 120%). cex.main
A complete list of graphical parameters is given in the helpfile for the par()
cex.sub
command, i.e. by typing
4.1. C USTOMIZING P LOTS 38
?par par()
Histogram of rnorm(500)
0.4
0.3
Density
0.2
2 1 0 1 2 3
rnorm(500)
Specification of Colours
There exist various means of specifying colours in R. One way is to use the R
names such as col="blue" , col="red" or even col="mediumslateblue". col
(A complete list of available colour names is obtained with colours(). ) Alter- colours()
natively, one can use numerical codes to specify the colours, e.g. col=2 (for red),
col=3 (for green) etc. Colours can also be specified in hexadecimal code (as in
html), e.g. col="#FF0000" denotes to red. Similarly, one can use col=rgb(1,0,0) rgb()
for red. The rgb() command is especially useful for custom colour palettes.
R offers a few predefined colour palettes. These are illustrated on the volcano
data example below:
data(volcano) data()
par(mfrow = c(2, 2))
image(volcano, main = "heat.colors")
image(volcano, main = "rainbow", col = rainbow(15))
image(volcano, main = "topo", col = topo.colors(15))
image(volcano, main = "terrain.colors",
col = terrain.colors(15))
4.2. M ATHEMATICAL A NNOTATIONS 39
heat.colors rainbow
topo.col terrain.colors
The resulting image maps with different colour palettes are shown in figure 4.3.
The (internal) dataset volcano, containing topographic information for Maunga
Whau on a 10m by 10m grid, is loaded by entering data(volcano). The com-
mand par(mfrow=c(a,b)) is used to split the graphics window into a b re-
gions (a rows and b columns). The image() function creates an image map image()
of a given matrix.
M1 : M P Gi = 0 + 1 HPi + ei
M2 : M P Gi = 0 + 1 HPi + 2 HPi2 + ei
Fitting the model and plotting the observations along with the two fitted models
is done with
M2 <- lm(MPGHP+HP2)
plot(HP, MPG, pch = 16)
x <- seq(0, 350, length = 500)
y1 <- M1$coef[1] + M1$coef[2]*x
y2 <- M2$coef[1] + M2$coef[2]*x + M2$coef[3]*x2
lines(x, y1, col="red")
lines(x, y2, col="blue")
^ ^ ^
M1:MPG = 0 + 1HP
^ ^ ^ ^
50
40
30
20
HP
Further options for annotating plots can be found in the examples given in the
help documentation of the legend() function. A list of available expressions is legend()
given in the appendix.
4.3. T HREE -D IMENSIONAL P LOTS 41
data(volcano)
persp(volcano)
persp(volcano, theta = 70, phi = 40)
Z
Z
volcan
Y
Y
volcano
The 3D space can be navigated by changing the parameters theta (azimuthal theta
direction) and phi (colatitude).
phi
Further options are illustrated in the example below:
par(mfrow=c(1,2))
# example 1:
persp(volcano, col = "green", border = NA, shade = 0.9,
theta = 70, phi = 40, ltheta = 120, box = FALSE,
axes = FALSE, expand = 0.5)
# example 2:
collut <- terrain.colors(101)
temp <- 1 + 100*(volcano-min(volcano)) /
(diff(range(volcano)))
mapcol <- collut[temp[1:86, 1:61]]
persp(volcano, col = mapcol, border = NA, theta = 70,
phi = 40,shade = 0.9, expand = 0.5, ltheta = 120,
4.3. T HREE -D IMENSIONAL P LOTS 42
lphi = 30)
Z
volcan
o
[i, j]
Y
y
x
http://134.76.173.220/dadler/rgl/index.html
Further information on RGL can be found the website and on the slides RGL:
An R-Library for 3D Visualization in R (rgl.ppt) in your working directory.
Appendix A
Rfunctions
list(x,y,z) x, y, z list(x[1],...,x[n]) x1 , . . . , xn
R ADICALS :
x[1]+...+x[n] x1 + + xn
list(x[1],cdots,x[n]) x1 , , xn
Expression Result
x[1]+ldots+x[n] x1 + . . . + xn
sqrt(x) x
sqrt(x,y) y
x
44
A.1. M ATHEMATICAL E XPRESSIONS (expression()) 45
S TYLE :
Expression Result
S ET R ELATIONS : displaystyle(x) x
Expression Result textstyle(x) x
x%subset%y xy scriptstyle(x) x
x%subseteq%y xy scriptscriptstyle(x) x
x%supset%y xy
T YPEFACE :
x%supseteq%y xy
Expression Result
x%notsubset%y x 6 y
x%in%y xy plain(x) x
x%notin%y x 6 y italic(x) x
bold(x) x
A CCENTS : bolditalic(x) x
Expression Result
B IG O PERATORS :
hat(x) x
Expression Result
tilde(x) x
o Pn
ring(x) x sum(x[i],i=1,n) xi
bar(x) x Q
1
prod(plain(P)(X==x),x) P(X=x)
widehat(xy) c
xy x
widetilde f
xy Rb
integral(f(x)*dx,a,b) f(x)dx
A RROWS : a
Expression Result
S
n
x%<->%y xy union(A[i],i==1,n) Ai
x%->%y xy i=1
T
n
x%<-%y xy intersect(A[i],i==1,n) Ai
i=1
x%up%y xy
x%down%y xy lim(f(x),x%->%0) lim f (x)
x%<=>%y xy x0
x%=>%y xy min(g(x),x>=0) min g(x)
x0
x%<=%y xy inf(S) inf S
x%dblup%y xy sup(S) sup S
x%dbldown%y xy
G ROUPING :
S PACING : Expression Result
Expression Result (x+y)*z (x + y)z
x y xy xy+z xy + z
x+phantom(0)+y x+ +y x(y+z) x(y+z)
x+over(1,phantom(0)) x+ 1 x{y+z} xy+z
F RACTIONS : group("(",list(a,b),"]") (a,
b]
bgroup("(",atop(x,y),")")) x
Expression Result
y
x
frac(x,y) y group(lceil,x,rceil) dxe
x
over(x,y) y
group(lfloor,x,rfloor) bxc
x
atop(x,y) y
group("|",x,"|") |x|
A.2. T HE RGL F UNCTIONSET 46