0% found this document useful (0 votes)
8 views46 pages

R Notes

R is a free software package for statistical computing and graphics, developed collaboratively and similar to the S language. It offers extensive built-in statistical functionality, high-quality graphing capabilities, and is widely used among statisticians. Users can download R from CRAN, utilize various data structures, and extend its capabilities with additional packages.

Uploaded by

itsmenoname0123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views46 pages

R Notes

R is a free software package for statistical computing and graphics, developed collaboratively and similar to the S language. It offers extensive built-in statistical functionality, high-quality graphing capabilities, and is widely used among statisticians. Users can download R from CRAN, utilize various data structures, and extend its capabilities with additional packages.

Uploaded by

itsmenoname0123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

What is R?

R is a software package for statistics and graphics, which is free in two ways: free download and
free source code (see www.r-project.org). More technically, R is a language and environment for
statistical computing and graphics under the terms of the (www.gnu.org). Free Software
Foundation's GNU General Public License in source code form. The current R is the result of a
collaborative effort with contributions from all over the world. R was initially written by Robert
Gentleman and Ross Ihaka—also known as "R & R" of the Statistics Department of the
University of Auckland. Since mid-1997 there has been a core group with write access to the R
source (see www.r-project.org/contributors.html). R is similar to the S language and environment
which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John
Chambers and colleagues.

Why use R?

 R has become the standard statistical software among statisticians. Consequently, new
statistical methods are often first available
 There is a great deal of built-in statistical functionality and many add-on packages
available that extend the basic functionality.
 R creates fine statistical graphs with relatively little effort.
 R is very well designed.
 R software is of very high quality.
 R is easy to use.

Installation of R

R can be downloaded and installed from any of many available CRAN sites from the R
foundation website (www.r-project.org).

R console

1
When you first launch R, you will see a window that looks something like the one in the image

to the right. .

When you use the R program it issues a prompt when it expects input commands.
The default prompt is ‘>’, which on UNIX might be the same as the shell prompt. In
a command line interface, you type commands that you want to execute and press return. For
example, if you type the line 2+2 and press the return key, R will give you the result
[1] 4.
In this mode, R can be used as a very simple calculator for addition, subtraction, multiplication,
and division using the standard operators +, -, *, and /. This ability to enter commands is the
fundamental building block for using the R program. In R, a variable is a name that is assigned a
particular value. The variable names are then used in place of numbers to complete calculations.
Values can be assigned to variables using one of three operators: <-, =, and ->. You can assign
the number 5 to the variable v1 with any of the following commands.

 v1 <- 5
 v1 = 5
 5 -> v1

Exit R

To exit R either type q( ) in the commands window or select File > Exit (that is, Exit on the File
menu). You will be asked if you wish to save the ‘workspace image’: a ‘no’ answer is
appropriate unless you have created some objects that you wish to use again.

A Few Important Syntax Conventions in R

R is case sensitive - so be very careful in the use of upper and lower case.

/ - forward slash is used in all path names (as opposed to the backward slash ‘\’).

‘ and “ (single and double quotes) are used interchangeably as long as they are paired.

2
() refers to functions and contains the arguments of the corresponding function.

[] refers to indexing and references row and/or column elements of a data structure.

# is used to type remarks

Getting help with functions and features


To get more information on any specific named function, for example to get the help for 'solve',
the command is
> help(solve) an alternative is
> ?solve
For a feature specified by special characters, the argument must be enclosed in double or single
quotes, making it a “character string”: This is also necessary for a few words with syntactic
meaning including if, for and function.
> help("[[")
> help.start()
which will launch a Web browser that allows the help pages to be browsed with hyperlinks. The
‘Search Engine and Keywords’ link in the page loaded by help.start() is particularly useful as it
is contains a high-level concept list which searches though available functions. The help.search
command (alternatively ??) allows searching for help in various ways. For example,
> ??solve
Try ?help.search for details and more examples. The examples on a help topic can normally be
run by
> example(topic) windows versions of R have other optional help systems: use
> ?help

R built-in data files

There are many built-in data files in R. You can type data( ) to see such data files. When you
type data( ), a new window called R data sets will appear, which includes the R data sets’ names
and a brief description for each data set. To look at the details of a specific data set, for example,
Titanic (Survival of passengers on the Titanic), you can type help(Titanic). A new window will
tell you the details of the data set you choose. For conducting statistical data analysis for the data

3
set, you have to include the data set into R through using, e.g., data(Titanic) first. If you type the
data file name, e.g. Titanic, at this time, R will reproduce the data set in the console window.

R commands, case sensitivity, etc


R is an expression language with a very simple syntax. It is case sensitive , so A and a are
different symbols and would refer to different variables. Normally all alphanumeric symbols are
allowed. Elementary commands consist of either expressions or assignments. If an expression is
given as a command, it is evaluated, printed (unless specifically made invisible), and the value is
lost. An assignment also evaluates an expression and passes the value to a variable but the result
is not automatically printed. Commands are separated either by a semi-colon (‘;’), or by a
newline. Elementary commands can be grouped together into one compound expression by
braces (‘{’ and ‘}’). Comments can be put almost anywhere, starting with a hashmark (‘#’),
everything to the end of the line is a comment. If a command is not complete at the end of a line,
R will give a different prompt, by default + on second and subsequent lines and continue to read
input until the command is syntactically complete. This prompt may be changed by the user.
Data permanency and removing objects
The entities that R creates and manipulates are known as objects. These may be variables, arrays
of numbers, character strings, functions, or more general structures built from such components.
During an R session, objects are created and stored by name. The R command:
> objects()
(alternatively, ls()) can be used to display the names of (most of) the objects which are currently
stored within R. The collection of objects currently stored is called the workspace. To remove
objects the function rm is available:Ex
> rm(x, y, z, ink, junk, temp, foo, bar)
All objects created during an R sessions can be stored permanently in a file for use in future
R sessions. At the end of each R session you are given the opportunity to save all the currently
available objects.

Libraries and Packages in R

While R is an expansive language with a large number of routines already included, it doesn't
include everything, and has several specific areas of omission with respect to multivariate
analyses (e.g., no CCA). Fortunately, the core routines are easily augmented with additional

4
user-written routines which can be loaded into your copy of R. These routines are usually
provided in what R calls a ‘package’, which is a package with the routine itself, help files, often
test data, and other items as necessary. Accordingly, it's necessary to know how to load packages
to make the most of R. Under Windows OS, click on the Packages menu and scroll down to the
Load package item. This will pop up a widget listing all available packages. To load the library,
simply click on the desired library. Alternatively, to load the library includes the library name as
listed in the library function. For example, enter: library(MASS) To see a list of installed
libraries, enter: library() If the library you want is not installed, you will have to install it
yourself. Again, depending on operating system and program, the details are somewhat different.

Installing Packages or Libraries in R

The best repository for R packages is CRAN at http://cran.r-project.org/. R generally refers to


‘packages’ rather than ‘libraries’. If your machine and operating system are supported, it's
usually simpler to use the pre-compiled binaries. If your machine is on the internet, R has
routines available to automatically install or update libraries or packages from CRAN. Under
Windows OS, click on the Packages menu and scroll down to Install packages from CRAN. This
will pop up a widget that lists all the packages available for DOS/R. Simply click on the desired
package and it will install.

Arithematic operators
The R language includes the usual arithmetic operators:
+ addition
- subtraction
* multiplication
/ division
ˆ or ** exponentiation

The left-pointing arrow (<-) is the assignment operator; it is composed of thetwo characters
<(less than) and -(dash or minus), with no intervening blanks,and is usually read as gets : “The
variable x gets the value c(1, 2, 3, 4)”. The equals sign (=) may also be used for assignment in
place of the arrow (<-), except inside a function call, where = is exclusively used to specify
arguments by name. Because reserving the equals sign for specification of function arguments
leads to clearer and less error-prone R code, we encourage you to use the arrow for assignment,
even where = is allowed. As the preceding example illustrates, when the leftmost operation in a
command is an assignment, nothing is printed. Typing the name of a variable, as in the second
command immediately above, causes its value to be printed.
Logical operators

Operator Description
< less than
<= less than or equal to
> greater than

5
>= greater than or equal to
== exactly equal to
!= not equal to
!x Not x
x|y x OR y
x&y x AND y
isTRUE(x) test if X is TRUE
# An example
x <- c(1:10)
x[(x>8) | (x<5)]
# yields 1 2 3 4 9 10

# How it works
x <- c(1:10)
x
1 2 3 4 5 6 7 8 9 10
x > 8
F F F F F F F F T T

Variables and Types

Like most programming languages, R allows users to create variables, which are essentially
named computer memory. For example, you may store the number of species in a sample in a
variable. Variables are identified by a name assigned when they are created. Names should be
unique, and long enough to clearly identify the contents of the variable. Variable names in R are
composed of letters (a–z, A–Z), numerals (0–9), periods (.), and underscores (_), and they may
be arbitrarily long. The first character must be a letter or a period, but variable names beginning
with a period are reserved by convention for special purposes. Names in R are case sensitive; so,
for example, x and X are distinct variables. They may not start with a number, or include the
characters "$" or "_" or any arithmetic symbols as these have special meaning in R. Variables
are assigned a value in an assignment statement, which in R has the variable name to the left of a
left-pointing arrow (typed with the "less than" followed by a "dash") with the value behind the
arrow. For example,
Age<-2

Notice that real or floating point numbers can be entered with just a decimal point, or in
exponential notation, where 1.0e-10 means .0000000001. Notice also that character variables,
called "strings" should be entered in quotes (single or double, it doesn't matter as long as they
match). Finally, note that the word TRUE is not surrounded by quotes. This is not the WORD
TRUE, but rather the VALUE TRUE. Logical variables can only take the values TRUE or
FALSE. Unlike many programming languages (e.g. FORTRAN or C) you do not have to tell R
what kind of value (integer, real, or character) a variable will contain; it can tell when the
variable is assigned. R will only allow the appropriate operations to be performed on a variable.
For example: name + 37 allow us to add 37 to name because species.name was a character
variable.

Data Structures

6
R is a 4th generation language, meaning that it includes high-level routines for working with data
structures, rather than requiring extensive programming by the analyst. In R there are 4 primary
data structures we will use repeatedly.

1. Vectors --- vectors are one-dimensional ordered sets composed of a single data type. Data
types include integers, real numbers, and strings (character variables).
2. Matrices --- matrices are two dimensional ordered sets composed of a single data type,
equivalent to the concept of matrix in linear algebra.
3. data frames --- data frames are one to multi-dimensional sets, and can be composed of
different data types (although all data in a single column must be of the same type). In
addition, each column and row in a data frame may be given a label or name to identify
it. Data frames are equivalent to a flat file database, and similar to spreadsheets.
Accordingly, we often refer to specific columns in a data frame as "fields."
4. Lists --- lists are compound objects of associated data. Like data frames, they need not
contain only a single data type, but can include strings (character variables), numeric
variables, and even such things as matrices and data frames. In contrast to data frames,
lists items do not have a row-column structure, and items need not be the same length;
some can be a single values, and others a matrix.

Different Data inputting methods in R


(a) Vectors and assignment
R operates on named data structures. The simplest such structure is the numeric vector, which
is a single entity consisting of an ordered collection of numbers. To set up a vector named x, say,
consisting of five numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the R command
> x <- c(10.4, 5.6, 3.1, 6.4, 21.7) Or
> assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))
This is an assignment statement using the function c() which in this context can take an arbitrary
number of vector arguments and whose value is a vector got by concatenating its arguments end
to end.
Vector arithmetic
Vectors can be used in arithmetic expressions, in which case the operations are
performed element
by element. Vectors occurring in the same expression need not all be of the same
length. If they are not, the value of the expression is a vector with the same length
as the longest vector which occurs in the expression. Shorter vectors in the
expression are recycled as often as need be (perhaps fractionally) until they match

7
the length of the longest vector. In particular a constant is simply repeated. So with
the above assignments the command;
> v <- 2*x + y + 1
generates a new vector v of length 11 constructed by adding together, element by
element, 2*x
repeated 2.2 times, y repeated just once, and 1 repeated 11 times. The elementary
arithmetic operators are the usual +, -, *, / and ^ for raising to a power. In addition
all of the common arithmetic functions are available. log, exp, sin, cos, tan, sqrt and
so on, all have their usual meaning. max and min select the largest and smallest
elements of a vector respectively. range is a function whose value is a vector of
length two, namely c(min(x), max(x)). length(x) is the number of elements in x, sum(x)
gives the total of the elements in x, and prod(x) their product. Two statistical functions are
mean(x) which calculates the sample mean, which is the same as sum(x)/length(x), and var(x)
which gives sum((x-mean(x))^2)/ (length(x)-1) or sample variance. If the argument to var() is an
n-by-p matrix the value is a p-by-p sample covariance matrix got by regarding the rows as
independent p-variate sample vectors. sort(x) returns a vector of the same size as x with
the elements arranged in increasing order; however there are other more flexible
sorting facilities available.
(b) Matrix

A matrix can be created by simply binding together two or more vectors of the same type and
length. For example, if we create a second demo.vector

We can then bind the two vectors together using the cbind() function to create a matrix

demo.matrix<-cbind(demo.vector1, demo.vector2)

demo.matrix<-matrix(c(1,4,2,6,12,4,2,1,2,4),byrow=F,nrow=5,ncol=2)

Matrices are specified in the order "row, column", so that demo.matrix[4,2] represents the
element at row 4 and column 2 in matrix demo.matrix. Individual rows or columns within a
matrix can be referred to by implied subscript, where the value of the desired row or column is
specified, but other values are omitted. For example, demo.matrix[,2] represents the second
column of matrix demo.matrix, as the row number before the comma was omitted. Similarly,
demo.matrix[5,] # represents row 5 of matrix demo.matrix, as the column after the comma was
omitted.

Matrix multiplication

8
Two matrices A and B can be multiplied using A%*%B. But if we want to get the 'term by term'
or to get the product of the corresponding elements of A and B we can use A*B.

The outer product of two arrays


An important operation on arrays is the outer product. If a and b are two numeric arrays,
their outer product is an array whose dimension vector is obtained by concatenating their two
dimension vectors (order is important), and whose data vector is got by forming all possible
products of elements of the data vector of a with those of b. The outer product is formed by
the special operator %o%:
> ab <- a %o% b
An alternative is
> ab <- outer(a, b, "*")
Matrix multiplication
The operator %*% is used for matrix multiplication. An n by 1 or 1 by n matrix may of course
be used as an n-vector if in the context such is appropriate. Conversely, vectors which occur in
matrix multiplication expressions are automatically promoted either to row or column vectors,
whichever is multiplicatively coherent, if possible, (although this is not always unambiguously
possible, as we see later).
If, for example, A and B are square matrices of the same size, then
>A*B
is the matrix of element by element products and
> A %*% B
is the matrix product. If x is a vector, then
> x %*% A %*% x
is a quadratic form
The function crossprod() forms “crossproducts”, meaning that crossprod(X, y) is the
same as t(X) %*% y but the operation is more efficient. If the second argument to crossprod()
is omitted it is taken to be the same as the first
Linear equations and inversion
Solving linear equations is the inverse of matrix multiplication. When after
> b <- A %*% x
only A and b are given, the vector x is the solution of that linear equation system. In R,
> solve(A,b)
solves the system, returning x (up to some accuracy loss). Note that in linear algebra, formally
x = A-1 b where A-1 denotes the inverse of A, which can be computed by
solve(A) but rarely is needed. Numerically, it is both inefficient and potentially unstable to
compute x
<- solve(A) %*% b instead of solve(A,b).
The quadratic form x0A􀀀1x which is used in multivariate computations, should be computed
by something like 2 x %*% solve(A,x), rather than computing the inverse of A.
Eigenvalues and eigenvectors
The function eigen(Sm) calculates the eigenvalues and eigenvectors of a symmetric matrix
Sm. The result of this function is a list of two components named values and vectors. The
assignment
> ev <- eigen(Sm)
will assign this list to ev. Then ev$val is the vector of eigenvalues of Sm and ev$vec is the

9
matrix of corresponding eigenvectors. Had we only needed the eigenvalues we could have used
the assignment:
> evals <- eigen(Sm)$values
evals now holds the vector of eigenvalues and the second component is discarded. If the
expression
> eigen(Sm)
is used by itself as a command the two components are printed, with their names.
Singular value decomposition and determinants
The function svd(M) takes an arbitrary matrix argument, M, and calculates the singular value
decomposition of M. This consists of a matrix of orthonormal columns U with the same column
space as M, a second matrix of orthonormal columns V whose column space is the row space
of M and a diagonal matrix of positive entries D such that M = U %*% D %*% t(V). D is
actually returned as a vector of the diagonal elements. The result of svd(M) is actually a list of
three components named d, u and v, with evident meanings.
If M is in fact square, then, it is not hard to see that
> absdetM <- prod(svd(M)$d)
calculates the absolute value of the determinant of M. If this calculation were needed often with
a variety of matrices it could be defined as an R function
> absdet <- function(M) prod(svd(M)$d)
after which we could use absdet() as just another R function.
Forming partitioned matrices, cbind() and rbind()
Matrices can be built up from other vectors and matrices by the functions cbind() and rbind().
Roughly cbind() forms matrices by binding together matrices horizontally, or column-wise, and
rbind() vertically, or row-wise.In the assignment
> X <- cbind(arg_1, arg_2, arg_3, ...)
the arguments to cbind() must be either vectors of any length, or matrices with the same column
size, that is the same number of rows. The result is a matrix with the concatenated arguments arg
1, arg 2, . . . forming the columns. If some of the arguments to cbind() are vectors they may be
shorter than the column size of any matrices present, in which case they are cyclically extended
to match the matrix column
size (or the length of the longest vector if no matrices are given). The function rbind() does the
corresponding operation for rows. In this case any vector argument, possibly cyclically extended,
are of course taken as row vectors. Suppose X1 and X2 have the same number of rows. To
combine these by columns into a matrix X, together with an initial column of 1s we can use
> X <- cbind(1, X1, X2)
The result of rbind() or cbind() always has matrix status.

Editing data
When invoked on a data frame or matrix, edit brings up a separate spreadsheet-like environment
for editing. This is useful for making small changes once a data set has been read. The command
> xnew <- edit(xold)
will allow you to edit your data set xold, and on completion the changed object is assigned
to xnew. If you want to alter the original dataset xold, the simplest way is to use fix(xold),

10
which is equivalent to xold <- edit(xold).
Use
> xnew <- edit(data.frame())
to enter new data via the spreadsheet interface.

Generating regular sequences


(c) The seq function
The function seq() is a more general facility for generating sequences. It has five
arguments,
only some of which may be specified in any one call.
seq(from , to, by) generate a sequence
indices <- seq(1,10,2)
#indices is c(1, 3, 5, 7, 9)
>seq(2,10)
> seq(-5, 5, by=.2) -> s3
s4 <- seq(length=51, from=-5, by=.2)
(d) rep function
A related function is rep() which can be used for replicating an object in various complicated
ways.
rep(x, ntimes) repeat x n times
y <- rep(1:3, 2)
# y is c(1, 2, 3, 1, 2, 3)
The simplest form is
> s5 <- rep(x, times=5)
(e) The scan() function for reading in data from the console .
For very small data vectors it is sometimes handy to read in data directly from the prompt. This
can be accomplished using the scan function from the command line. The scan function reads
the fields of data in the file as specified by the what option, with the default being numeric. If
the what option is specified to be what=character() or what=" " then all the fields will be read as
strings. If the data are a mix of numeric, string or complex data, then a list can be used in the
what option. The default separator for the scan function is any white space (single space, tab, or
new line). Because the default is space delimiting, you can enter data on separate lines. When all
the data have been entered, just hit the enter key twice which will terminate the scanning.
Eg. # Reading in numeric data
> x <- scan()

11
1: 3 5 6
4: 3 5 78 29
8: 34 5 1 78
12:
Read 11 items

Suppose the data vectors are of equal length and are to be read in parallel. Further
suppose
that there are three vectors, the first of mode character and the remaining two of
mode numeric,
and the file is ‘input.dat’. The first step is to use scan() to read in the three vectors
as a list,
as follows
> inp <- scan("input.dat", list("",0,0))

Importing data files using the scan function

The scan function is an extremely flexible tool for importing data. Unlike the read.table
function, however, which returns a data frame, the scan function returns a list or a vector. For the
what option, we use list and then list the variables, and after each variable, we tell R what type
of variable (e.g., numeric, string) it is. In the first example, the first variable is age, and we tell R
that age is a numeric variable by setting it equal to 0. The second variable is called name, and it
is denoted as a string variable by the empty quote marks.

# inputting a text file and outputting a list


(x <- scan("http://www.ats.ucla.edu/stat/data/scan.txt", what = list(age = 0,
name = "")))

(test <- read.table("http://www.ats.ucla.edu/stat/data/test.txt", header =


TRUE))
(f) The array() function
As well as giving a vector structure a dim attribute, arrays can be constructed from
vectors by
the array function, which has the form
> Z <- array(data_vector, dim_vector)
For example, if the vector h contains 24 or fewer, numbers then the command
> Z <- array(h, dim=c(3,4,2))
would use h to set up 3 by 4 by 2 array in Z. If the size of h is exactly 24 the result
is the same
as

12
> dim(Z) <- c(3,4,2)
However if h is shorter than 24, its values are recycled from the beginning again to
make
it up to size 24. As an extreme but common
example
> Z <- array(0, c(3,4,2))
makes Z an array of all zeros
(g) Data Frames

One of the most challenging tasks in data analysis is data preparation. R provides various
structures for holding data and many methods for importing data from both keyboard and
external sources. One of those structures is data frames. Data frames are the primary data
structure in R. A data frame is used for storing data tables. It is a list of vectors of equal length. A
data.frame object in R has similar dimensional properties to a matrix but it may contain
categorical data, as well as numeric. The standard is to put data for one sample across a row and
covariates as columns. On one level, as the notation will rea ect, a data frame is a list. Each
component corresponds to a variable; i.e., the vector of values of a given variable for each
sample. A data frame is like a list with components as columns of a table.

Usage
data.frame(..., row.names = NULL, check.rows = FALSE,
check.names = TRUE,
stringsAsFactors = default.stringsAsFactors())

default.stringsAsFactors()

Arguments
... these arguments are of either the form value or tag = value. Component
names are created based on the tag (if present) or the deparsed argument
itself.
row.names NULL or a single integer or character string specifying a column to be used as
row names, or a character or integer vector giving the row names for the data
frame.
check.rows if TRUE then the rows are checked for consistency of length and names.
check.names logical. If TRUE then the names of the variables in the data frame are checked
to ensure that they are syntactically valid variable names and are not
duplicated. If necessary they are adjustedso that they are.
stringsAsFactors logical: should character vectors be converted to factors? The ‘factory-fresh’
default is TRUE, but this can be changed by setting
options(stringsAsFactors = FALSE).

13
For example
> x=18:29
> y=c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
We will now use R's data.frame command to create our first dataframe and store the results in
the variable village.
> village=data.frame(age=x,height=y)
> village
age height
1 18 76.1
2 19 77.0
3 20 78.1
4 21 78.2
5 22 78.8
…………….

Making data frames


Objects satisfying the restrictions placed on the columns (components) of a data
frame may be
used to form one using the function data.frame:
> accountants <- data.frame(home=statef, loot=incomes, shot=incomef)
A list whose components conform to the restrictions of a data frame may be
coerced into a
data frame using the function as.data.frame()
The read.table() function
To read an entire data frame directly, the external file will normally have a special form.
The first line of the file should have a name for each variable in the data frame. Each additional
line of the file has as its first item a row label and the values for each variable.

(h) Importing data

Data frames can be accessed exactly as can matrices, but can also be accessed by data frame and
column or field name, without knowing the column number for a specific data item. For
illustration, let’s load the testbird.csv dataset by typing the following (we will come back to
importing and exporting data later):

testbird<-read.csv("D:/testbird.csv", header=TRUE)

testbird<-read.csv(file.choose(), header=TRUE)

14
where you will have to put in the correct local path to the folder containing the testbird dataset.
Because the variables types are mixed in this incoming data set (containing both numeric and
character fields), the data structure will be classed as a data frame automatically.

The read.table() function


To read an entire data frame directly, the external file will normally have a special
form.
_ The first line of the file should have a name for each variable in the data frame.
_ Each additional line of the file has as its first item a row label and the values for
each
variable.
Logical vectors
As well as numerical vectors, R allows manipulation of logical quantities. The
elements of a
logical vector can have the values TRUE, FALSE, and NA (for “not available”, see
below). The
first two are often abbreviated as T and F, respectively. The logical operators are <,
<=, >, >=, == for exact equality and != for inequality. In addition if c1 and c2 are
logical expressions, then c1 & c2 is their intersection (“and”), c1 | c2 is their union
(“or”), and !c1 is the negation of c1. Logical vectors may be used in ordinary
arithmetic, in which case they are coerced into numeric vectors, FALSE becoming 0
and TRUE becoming 1.
Missing values
When an element or value is “not available” or a “missing value” in the statistical
sense, a place within a vector may be reserved for it by assigning it the special
value NA. In general any operation on an NA
becomes an NA. The function is.na(x) gives a logical vector of the same size as x
with value TRUE if and
only if the corresponding element in x is NA.
> z <- c(1:3,NA); ind <- is.na(z)
Notice that the logical expression x == NA is quite different from is.na(x) since NA
is not really a value but a marker for a quantity that is not available. Thus x == NA
is a vector of the same length as x all of whose values are NA as the logical
expression itself is incomplete and hence undecidable

15
Character vectors
Character quantities and character vectors are used frequently in R, for example as
plot labels.
Where needed they are denoted by a sequence of characters delimited by the
double quote character, e.g., "x-values", "New iteration results". Character strings
are entered using either matching double (") or single (’) quotes, but are printed
using double quotes (or sometimes without quotes)
Index vectors; selecting and modifying subsets of a data set
A logical vector. In this case the index vector must be of the same length as the
vector from which elements are to be selected. Values corresponding to TRUE in the
index vector are selected and those corresponding to FALSE are omitted. For
example
> y <- x[!is.na(x)]
creates (or re-creates) an object y which will contain the non-missing values of x, in
the
same order. Note that if x has missing values, y will be shorter than x. Also
> (x+1)[(!is.na(x)) & x>0] -> z
creates an object z and places in it the values of the vector x+1 for which the
corresponding
value in x was both non-missing and positive.
> x[1:10]
selects the first 10 elements of x (assuming length(x) is not less than 10). Also
> c("x","y")[rep(c(1,2,2,1), times=4)]
(an admittedly unlikely thing to do) produces a character vector of length 16
consisting of
"x", "y", "y", "x" repeated four times.
3. A vector of negative integral quantities. Such an index vector specifies the values
to be
excluded rather than included. Thus
> y <- x[-(1:5)]
gives y all but the first five elements of x. A vector of character strings. This
possibility only applies where an object has a names attribute to identify its

16
components. In this case a sub-vector of the names vector may be used in the same
way as the positive integral labels in item 2 further above.
> fruit <- c(5, 10, 1, 20)
> names(fruit) <- c("orange", "banana", "apple", "peach")
> lunch <- fruit[c("apple","orange")]
> x[is.na(x)] <- 0
replaces any missing values in x by zeros and
> y[y < 0] <- -y[y < 0]
has the same effect as
> y <- abs(y)
For example, testbird is a bird abundance data frame containing 32 sample plots (rows) and 10
fields (columns). The first 3 fields (columns) contain plot identifiers. The first two are character
fields (BASIN and SUB) and the third field is numeric (BLOCK). The remaining 7 fields
(columns) are numeric and contain abundances for 7 different bird species. Given this data
structure, we can perform the following:
x<-max(testbird[,5]) # assigns the maximum value of the second species (fifth column) among
all plots. Alternatively, because testbird is a data frame, we can accomplish the same thing with
the following:
x<-max(testbird$AMRO)
y<-sum(testbird[,2]) # assigns the sum of second listed species abundance in all plots to y
x<-log(testbird[,4:10]+1) # creates a new matrix called ‘x’ with all values the log of the
respective values in columns 4 through 10 in testbird (+1 to avoid log(0) which is undefined) In
addition, R supports logical subscripts, where the subscript is applied whenever the logical
function is true.
For examples,
x<-sum(testbird[,5]>1) # assigns the number of plots where the abundance of the species in
column 5 is greater than 1 (testbird[,5]>0 is evaluated as 1 (true) or 0 (false), so that the sum is of
0's and 1's).
x<-sum(testbird[,5][testbird[,5]>1]) # assigns the sum of the abundance for the species in
column 5 in plots where species in column 5 has abundance greater than 1.
x<-max(testbird[,5][testbird$BHGR==5]) # assigns the maximum abundance for the species in
column 5 for plots with the abundance of BHGR equal to 5.

Editing data

17
When invoked on a data frame or matrix, edit brings up a separate spreadsheet-like
environment for editing. This is useful for making small changes once a data set has
been read. The command
> xnew <- edit(xold)
will allow you to edit your data set xold, and on completion the changed object is
assigned to xnew. If you want to alter the original dataset xold, the simplest way is
to use fix(xold), which is equivalent to xold <- edit(xold).
Use
> xnew <- edit(data.frame())
to enter new data via the spreadsheet interface.
Missing Values

A final special case is of special note. Missing values in a vector or matrix are always a problem
in data sets. Sometimes it is best simply to remove samples with missing data, but often only one
or a few values are missing, and it's best to keep the sample in the matrix with a suitable missing
value code. Let’s assume that we have missing values in a vector. First, select the fourth column
from the testbird data frame, which contains a single missing value:
x<-testbird[,4]
To use all of the vector EXCEPT the missing value, use:
y<-x[!is.na(x)]
The R function to identify a missing value is:
is.na( )
so that to say all of a vector except missing values, we set a logical test to be true when values
are not missing. Since the R operator for ‘not’ is !, the correct test is:
!is.na( )
and to specify which vector we're testing for missing value, we put the vector in parentheses as
follows:
!is.na(x)
Accordingly, the full expression is
x[!is.na(x)]
This use of missing values is critical to R because all operations on vectors or matrices must
have the same number of elements. So, if there are missing values in any field we're using in a
calculation, the same record (row) must be omitted from all the other fields as well.

18
Functions in R

A function consists of a name and one or more parameters (or arguments) contained in
parentheses that are required to process the function. A simple function that we have already
used is the sum() function, which returns the sum of all the values present in its arguments. In its
simplest, sum() contains two arguments:

sum(x, na.rm = FALSE)

The first argument, x, is the data set (either a vector, matrix, or data frame containing all numeric
variables) you wish to sum, and the second argument indicates whether missing values should be
ignored. The default na.rm=FALSE will return NA if there are any missing values, whereas
na.rm=TRUE will ignore the missing values when calculating the sum.

In the case of the testbird data set, applying the sum function to the species abundance fields
(columns 4-10) with the default argument of na.rm=FALSE, returns the following:

sum(testbird[,4:10])

Note, there is no need to include the arguments if you wish to use the defaults provided.
Applying the sum function with the missing values argument set to TRUE, returns the following:

sum(testbird[,4:10],na.rm=TRUE)

Note, in this case the sum is across all elements in all rows and in columns 4 through 10. The
apply() function we used above is a special function that allows us to apply other functions to
each column or row of the matrix. In this case, we applied the sum() function to each species
column of the testbird data set and returned a vector of values containing the sum of abundance
for each species.

When using functions it is important to understand the arguments of the function. The arguments
of a function are all defined in the associated help file. Each function has one or more named
arguments. Some or all of the arguments may come with default values, in which case you do not
need to specify any arguments inside the () when calling the function. However, in most cases
one or more of the arguments will not have a default value and thus you must provide a value for
the argument. For example, the sum() function requires that you specify a data set (an object,
either a vector, matrix, or data frame containing all numeric variables). If you do not specify a
value for this argument, you will get an error message.

In addition, if you specify values for arguments in the order that they are given in the written
function, then the arguments do not need to be named explicitly in the function call. For
example, in the apply() function, the following two calls are equivalent:

apply(X=testbird[,4:10], MARGIN=2, FUN=sum)

19
apply(testbird[,4:10],2,sum)

This is because in the second call the arguments are given in the same order as expected. If,
however, you want to specify the arguments in a different order from the default, then the
argument names must be included in the function call, e.g.:

apply(testbird, FUN=sum, MARGIN=2)

In practice, explicitly naming the arguments often is required when you want to only specify say
the first and fourth argument of the function and accept the default values for the second and
third. In this case, you do not need to name the first argument if given first in your call, but you
must name the fourth argument.

Functions are essential to working with R. You will be using functions constantly to manipulate,
summarize, analyze, and graphically display your data. For most of the things you will need to
do in this course, functions have already been written by others and you will simply need to
know how to call these functions and interpret their output. However, you can’t work long in R
without confronting the need to construct your own functions. In most cases, these will be
functions that call or make use of existing R functions, but in particular ways suited to your
applications. Throughout this course, we will make extensive use of existing R functions to
complete projects, but there may be a need or opportunity for you to create your own functions.
Any time you issue a set of commands that you anticipate having to repeat or reuse in the future,
you should consider writing a function. Although we will not go into the details of writing
functions here, you can easily review the code for a function by simply typing the function name
at the console.

Some useful functions


builtins() # List all built-in functions
options() # Set options to control how R computes & displays results
?NA # Help page on handling of missing data values
abs(x) # The absolute value of "x"
append() # Add elements to a vector
c(x) # A generic function which combines its arguments
cat(x) # Prints the arguments
cbind() # Combine vectors by row/column (cf. "paste" in Unix)
diff(x) # Returns suitably lagged and iterated differences
gl() # Generate factors with the pattern of their levels
grep() # Pattern matching
identical() # Test if 2 objects are *exactly* equal
jitter() # Add a small amount of noise to a numeric vector
julian() # Return Julian date
length(x) # Return no. of elements in vector x
ls() # List objects in current environment
mat.or.vec() # Create a matrix or vector
paste(x) # Concatenate vectors after converting to character
range(x) # Returns the minimum and maximum of x
rep(1,5) # Repeat the number 1 five times
rev(x) # List the elements of "x" in reverse order

20
seq(1,10,0.4) # Generate a sequence (1 -> 10, spaced by 0.4)
sequence() # Create a vector of sequences
sign(x) # Returns the signs of the elements of x
sort(x) # Sort the vector x
order(x) # list sorted element numbers of x
tolower(),toupper() # Convert string to lower/upper case letters
unique(x) # Remove duplicate entries from vector
system("cmd") # Execute "cmd" in operating system (outside of R)
vector() # Produces a vector of given length and mode
floor(x), ceiling(x), round(x), signif(x), trunc(x) # rounding functions
Sys.time() # Return system time
Sys.Date() # Return system date
getwd() # Return working directory
setwd() # Set working directory
list.files() # List files in a give directory
file.info() # Get information about files
log(x),logb(),log10(),log2(),exp(),expm1(),log1p(),sqrt() # Fairly obvious
cos(),sin(),tan(),acos(),asin(),atan(),atan2() # Usual stuff
cosh(),sinh(),tanh(),acosh(),asinh(),atanh() # Hyperbolic functions
union(),intersect(),setdiff(),setequal() # Set operations
+,-,*,/,^,%%,%/% # Arithmetic operators
<,>,<=,>=,==,!= # Comparison operators
eigen() # Computes eigenvalues and eigenvectors

deriv() # Symbolic and algorithmic derivatives of simple expressions


integrate() # Adaptive quadrature over a finite or infinite interval.

sqrt(),sum()

Writing user defined functions


It should be emphasized that most of the functions supplied as part of the R system, such
as mean(), var(), postscript() and so on, are themselves written in R and thus do not differ
materially from user written functions.
A function is defined by an assignment of the form
function_name <- function(arg1, arg2, ... ){
expression
return(object)
}

function_name: is the function’s name. This can be any valid variable name, but you should
avoid using names that are used elsewhere in R, such as dir, function, plot, etc.

21
arg1, arg2, arg3: these are the arguments of the function, also called formals. You can write a
function with any number of arguments. These can be any R object: numbers, strings, arrays,
data frames, of even pointers to other functions; anything that is needed for the function_name
function to run.

Some arguments have default values specified, such as arg3 in our example. Arguments without
a default must have a value supplied for the function to run. You do not need to provide a value
for those arguments with a default, as the function will use the default value.

Function body: The function code between the within the {} brackets is run every time the
function is called. This code might be very long or very short. Ideally functions are short and do
just one thing – problems are rarely too small to benefit from some abstraction.

Return value: The last line of the code is the value that will be returned by the function. It is not
necessary that a function return anything, for example a function that makes a plot might not
return anything, whereas a function that does a mathematical operation might return a number, or
a list.

Examples:
(1)
f1 <- function(x, y) {
x+y
}

f1( 3, 4)
(2)
Another example, consider a function to calculate the two sample t-statistic,
showing “all the
steps”. This is an artificial example, of course, since there are other, simpler ways of
achieving
the same end.
The function is defined as follows:
twosamplet <- function(y1, y2) {
n1 <- length(y1); n2 <- length(y2)
yb1 <- mean(y1); yb2 <- mean(y2)

22
s1 <- var(y1); s2 <- var(y2)
s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2)
tst <- (yb1 - yb2)/sqrt(s*(1/n1 + 1/n2))
tst}

(3)
f.good <- function(x, y) {
z1 <- 2*x + y
z2 <- x + 2*y
z3 <- 2*x + 2*y
z4 <- x/y
return(c(z1, z2, z3, z4))
}

f.good(1, 2)

(4)
intsum <- function(from=1, to=10)
{
sum <- 0
for (i in from:to)
sum <- sum + i
sum
}
intsum(3) # Evaluates sum from 3 to 10 …
intsum(to = 3) # Evaluates sum from 1 to 3 …
(5) Newton Raphson method
f<-function(x)

{return(x^3-x-1)}

f1<-function(x)

{return(3*x^2-1)}

x<-1.5

i<-1

h<- -1*f(x)/f1(x)

23
if(abs(h)<0.0001){

print(x)

}else {

NR<-function()

while(abs(h)> 0.0001|i <100)

h<- -1*f(x)/f1(x)

x<-x+h

i<-i+1

return(x)

NR()

Control statements
Control structures commonly used in R include:
if, else: testing a condition
for: execute a loop for a fixed number of times
while: execute a loop while a condition is true
repeat: execute a loop until seeing a break
break: break the execution of a loop
next: skip an iteration of a loop
return: exit a function
(a) if … else …
if (expr_1) {
expr_2
……}else {
expr_3
…….}
The first expression should return a single logical value

24
Example: 1
a<- -2
if(a<0){
cat(a, "is a negative number")
}else{
cat(a,"is a postive number")}
Example: 2
n=32
if(n%%2!=0){
print("n is not even")
}else{
print("n is an even")}

(2) for
for (name in expr_1)
expr_2

Example-1: Sample M random pairings in a set of N objects


for (i in 1:10)

print(i)

Example-2 Random walk

z<-rep(0,1000)

for(i in 1:1000){

coin<-rbinom(1,1,0.5)

if(coin==1){

z[i]=z[i]+1

}else {

25
z[i]=z[i]-1

plot(z,type="b")}

(3) The repeat function

# Sample with replacement from a set of N objects until the number 15 is sampled twice
M <- 0 # M is the number of samplings required to reach the criteria in this run
matches <- 0
N<-100 # integer random value between 1 to 100 selected
repeat
{
# Keep track of total connections sampled
M <- M + 1
# Sample a new connection
p = sample(N, 1) # random sample between 0 to N selected
# Increment matches whenever we sample 15
if (p == 15)
matches <- matches + 1;
# Stop after 2 matches
if (matches == 2)
break;
}
M
(4) The while function
while (expr_1)
expr_2
Here while expr_1 is false, repeatedly evaluate expr_2. break and next statements can be
used within the loop
Example:Random walk

26
Z<-5
while(z>=3&&z<=10){
print(z)
coin<-rbinom(1,1,0.5)
if(coin==1){
z=z+1
}else {
z=z-1
}
}
plot(z)

Random Generation
􀁺 runif(n, min = 1, max = 1) • Samples from Uniform distribution
􀁺 rbinom(n, size, prob) • Samples from Binomial distribution
􀁺 rnorm(n, mean = 0, sd = 1) • Samples from Normal distribution
􀁺 rexp(n, rate = 1) • Samples from Exponential distribution
􀁺 rt(n, df) • Samples from T-distribution
􀁺 And others!

Plotting in R

R has a powerful graphics capability that is much of the appeal to using the system. Many of the
analyses have special plotting capabilities that allow you to plot results without storing multiple
intermediate products.

To get a quick feel for how easy it is to create plots, let’s first create a simple data set containing
three numeric variables:

x<-1:50 # creates an ordered vector with elements 1 through 50

y<-rnorm(50,0,1) # creates a vector of random numbers; length 50; mean 0; variance 1 Now we
can produce a simple scatter plot of x against y using the basic plot () function. Simply type:

plot(x,y) # note, a call to any of the plotting functions will automatically open up a graphics
device and display the results in that device.

We can change just about any aspect of the plot with a bewildering array of graphical controls

given as arguments to the plot function. Here are some examples for you to try:

27
plot(x,y,type=’o’) # to change the type of plot, try type=‘l’,’o’,’b’, and ‘s’

plot(x,y,type=‘o’,lty=2) # to change the line type try lty=1,2,3,4,...

plot(x,y,type=‘o’,col=’blue’) # to change the line color try col=‘blue’,‘red’, ‘green’, ...

plot(x,y,type=‘o’,pch=2) # to change the point symbols try pch=l,2,3,4,...

plot(x,y,type=‘o’,cex=2) # to change the point size try cex=2,3,4,...

plot(x,y,type=‘o’,lwd=2) # to change the line width try lwd=2,3,4,...

To see a complete list of plot controls, look at the help file for the par() function:

help(par)

Most or all of the par commands to control the graphics can be given as arguments to the plot
function (as above). However, it is also possible to set these graphics controls for the graphics
device being use so that all plots to that device will adopt these same controls. This is done by
issue a par() command before a plot command. Some examples are as follows:

par(mfrow=c(2,3)) # partitions the page into 6 sections, 2 rows and 3 columns.

par(new=TRUE) # used to overlay different plot types

par(mar=c(0.6,0.5,0.1,0.5)) # specifies margin size in inches (bottom, left, top, right) etc.

Of course there are many more options and these can all be specified in a single command, for
example:

par(mfrow=c(2,3),new=TRUE, mar=c(0.6,0.5,0.1,0.5))

Of course there are many different kinds of plots for displaying data. The basic plot() function is
simply a starting point. There are many different so-called “high-level” plotting functions, for
example:

plot() — line/point plots; output depends on class of data

hist() — histogram of single variable

boxplot() — box-and-whisker plot of single variable

qqplot() — quantile-quantile plot of single variable

coplot() — graphs of 3 or more variables

image() — draws grid of rectangles using 3 variables

28
contour() — draws contours using 3 variables

persp() — draws 3D surface

pairs() — all pairwise plots between multiple variables etc.

In addition, there are many so-called “low-level” plotting functions used to plot additional
elements over an existing plot (i.e., overlays). These low-level functions are always called after a
high-level command in order to supplement the high-level plot. Some examples of low-level
commands include:

points() — add points to an existing plot

lines() — add lines to an existing plot

test() —add text to an existing plot

abline() — draw a line in intercept and slope form across an existing plot

polygon() — draw a polygon on an existing plot

legend() — add a legend to an existing plot

title() — add a title to an existing plot

axis() — add further axis scales to an existing plot

There is of course much more detail to plotting in R, but this should suffice for now. We will be
making extensive use of the plotting capabilities of R throughout this course.

2. Two-dimensional and High–dimensional scatter plots

1) Scatterplots, pairwise scatterplots

We are using this famous (Fisher's) iris data set to illustrate the relationships between two or
more variables in a 2-dimsional plane

> plot(Sepal.Length,Sepal.Width,main = "Scatter plot", pch = 16)

However, we have more than two variables of interest. A set of pairwise scatterplots (sometimes
called a draftsman plot) may be of use:

> pairs(iris)

> pairs(iris[1:4])

> pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species", pch = 21, bg = c("red", "green3",
"blue")[unclass(iris$Species)])

29
There other useful functions available. For example what does splom() do? (Look up >?splom).

> library(lattice)

> ?splom

2) 3-D Scatter plots

There are facilities in R for making 3d effect scatterplots: you need to download and install an
additional library, and when you load the library you need to tell R where to find it. It is just
possible to envisage the three dimensions on the printed page.

> install.packages("scatterplot3d")

> library(scatterplot3d)

> attach(iris)

> s3d<-scatterplot3d(Sepal.Length,Sepal.Width,Petal.Length, main="3D Scatterplot of Iris


Data")

> s3d$points3d(Sepal.Length,Sepal.Width,Petal.Length,pch = 21, bg = c("red", "green3", "blue")


[unclass(iris$Species)])

3) Spinning 3d Scatterplot using function plot3d{rgl}

> library(rgl)

> plot3d(Sepal.Length,Sepal.Width,Petal.Length, main = "3D Scatterplot of Iris Data",


col=c("red", "green3", "blue")[unclass(iris$Species)])

> detach(iris)

3. Other types of 3-D plot methods: image() and contour()

A display example of the volcano:

> x <- 10*(1:nrow(volcano))

> y <- 10*(1:ncol(volcano))

> image(x, y, volcano, col = terrain.colors(100), axes = FALSE)

> contour(x, y, volcano, levels = seq(90, 200, by = 5), add = TRUE, col = "peru")

> axis(1, at = seq(100, 800, by = 100))

> axis(2, at = seq(100, 600, by = 100))

30
> box()

> title(main = "Maunga Whau Volcano", font.main = 4)

> abline(h = 200*0:4, v = 200*0:4, col = "lightgray", lty = 2, lwd = 0.1)

A Perspective plot of a surface over the x–y plane: persp()

> persp(x, y, volcano, theta = 30, phi = 30, expand = 0.5, col = "lightblue", xlab = "X", ylab =
"Y", zlab = "Altitude ")

Lab Exercise

1. The following are five measurements on the variables


x 1 , x2 , and x 3 :
x1 9 2 6 5 8

x2 12 8 6 4 10

x3 3 4 0 2 1

a. Create 3 vectors
x 1 , x2 , and x 3 .

[ ]
9 12 3
2 8 4
X= 6 6 0
5 4 2
b. Construct a matrix
8 10 1

c. Find the arrays


x̄ , S n , R using R

Some useful R programs

Example-1: Program for exploratory data analysis

eda<-function(x)

31
par(mfrow=c(2,3))

qqnorm(x)

qqline(x)

boxplot(x)

title("Box Plot")

hist(x,main="Histogram")

iqd<-summary(x)[5]-summary(x)[2]

plot(density(x,width=2*iqd),xlab="x",ylab="",type="l")

ts.plot(x)

title("Time series Plot")

acf(x)

invisible()

eda(trackrecords$marathon_min)

Example -2: Trapezoidal rule


f<-function(x)

{return(1/(1+x^2))}

h<-0.1

a<-0

b<-1

x<-seq(a,b, by=h)

y<-f(x)

n<-(b-a)/h

coef=c(rep(1,1),rep(2,n-1),rep(1,1))

I=(h/2)*(sum(coef*y))

Example -3: Simpson’s (1/3) rule


f<-function(x)

{return(1/(1+x^2))}

n=100

32
if(n%%2!=0){

print("n is not even")

}else{

a<-0

b<-1

h<-(b-a)/n

x<-seq(a,b, by=h)

y<-f(x)

coef=c(rep(1,1),rep(c(4,2),(n-2)/2),rep(c(4,1),1))

I=(h/3)*(sum(coef*y))

The following proof shows that Box -Muller transformation produces independent normal
random variables.

=> f Z ( z ) =f Z ( Z 1 )∗f Z ( Z 2 )
1 2

This shows that the density of Z is the product of two independent standard normal variables

33
Application of accept reject method to Normal random number generation

If we want to generate X ~ σZ+µ, where Z denotes a rv with the N(0,1) distribution. Thus it
suffices to find an algorithm for generating Z ~ N(0,1). Moreover, if we can generate from the
absolute value |Z|, then by symmetry we can obtain our Z by independently generating a rv S (for
sign) that is ± 1with probability 0.5 and setting Z = S*|Z|. In other words we generate a S=U and
set Z = -|Z| if U< 0.5 and Z = |Z| if U ≥ 0.5. The density of |Z| is

2 −x /2 2

f ( x )= e , x≥ 0
√2 π

For the instrumental density we take g(x) = e− x , x > 0, the exponential density with rate 1,
something we already know how to easily simulate using inverse transform method.

f (x ) 2
Now h(x) = = e x− x /2 √ 2/π
g(x)

f (x )
If we can find a maximum value M for h(x) such that ≤ M , we can say that
g(x)
f (x)≤ Mg ( x ). Therefore, we simply use calculus to compute its maximum (solve h' (x)
= 0); which must occur at that value of x which maximizes the exponent x−x 2 /2;
namely at value x = 1. Therefore M= √ 2 e/ π
f ( y) 2 2 2
Further, = e y− y /2 √ 2/ π / √ 2 e/ π = e y−1− y / 2= e−( y−1 ) / 2
Mg ( y )

f ( y) 2
=> U ≤ means that U ≤ e−( y−1 ) / 2
Mg ( y )

Algorithm for the generation of Z

1. Generate Y with an exponential distribution at rate 1; that is generate U and set Y = -ln(U)

2. Generate another U
2
3. If U ≤ e−( y−1 ) / 2, set |Z| = Y; otherwise go back to 1

4. Generate another U. Set Z= -|Z| if U ≤ 0.5 and set Z=|Z| if U > 0.5

34
2
Note: U ≤ e−( y−1 ) / 2 only if -log(U) ≥ ( y−1 )2 /2 and since -log(U) is exponential with rate 1, we can
simplify the above algorithm as

1. Generate Y1 and Y2 with exponential distribution at rate 1; that is generate U1 and U2 and set
Y1 = -ln(U1), Y2 = -ln(U2)

2
2. If Y2 ≥ ( Y 1−1 ) /2 , set |Z| = Y1; otherwise go back to 1

3. Generate another U. Set Z= -|Z| if U ≤ 0.5 and set Z=|Z| if U > 0.5

Example -3: R code for accept reject method

#############################################################################

n<-1000 # number of samples to be generated

mu<-5 # mean of the normal to be generated

sig<-1 #standard devaition of the normal to be generated

z<-rep(0,n) # vector to store the RN initaiated

for(i in 1:n) # algorithm starts

u1<-runif(1) # uniform RN generated

u2<-runif(1) # second uniform RN generated

y1<--log(u1) # expential RN with parameter 1 generated

y2<--log(u2) # expential RN with parameter 1 generated

if(y2 >= (y1-1)^2/2){ # acceptance condition Accepr-Rejecte method tested

u3<-runif(1) # uniform RN for the assign of sign to normal RV generated

if(u3<0.5)

z[i]<--y1 # -sign assigned and standard normal RV generated

else

z[i]<-y1} # sign assigned and a standard normal RV generated

35
else

i=i-1 #rejection happened

x<-mu+sig*z # converted into normal RV with mean mu and SD sig

hist(x) # histogram of the sample drawn

#############################################################################

Example -4: Newton Raphson method

g <-function(x)
x^3-2*x-5
derg <- function(x)
3*x^2-2
newton2 <- function(fun, derf, x0, eps, nlim){
iter <- 0
repeat{
iter <- iter+1
if(iter > nlim){
cat(" Iteration Limit Exceeded: Current = ",iter,fill = T)
x1 <- NA
break
}
x1 <- x0 - fun(x0)/derf(x0)
if(abs(x0 - x1) < eps||abs(fun(x1))<1.0e-12)
break
x0 <- x1
cat("******Iter. No: ", iter, " Current Iterate =", x1,fill=T)
}
return(x1)
}
newton2(g,derg,2.0,.00001,100)
################################

Example -5:## bisection method

bisection <- function(f,lower,upper,tol=0.0000001){


while(abs(upper-lower)>tol){
middle <- (lower+upper)/2
if(f(middle)*f(lower)<0)upper <- middle
else lower <- middle}
return((lower+upper)/2)}

Example -6:## Iteration Method

36
iteration <- function(f,x0,tol=0.0000001){
x <- x0
while(abs(f(x)-x)>tol) x <- f(x)
return(x)}

Example -7:## Newton-Raphson Method

NR <- function(f,f1,x0,tol=0.000001){
x <- x0
delta <- f(x)/f1(x)
while(abs(delta)>tol){
x <- x-delta
delta <- f(x)/f1(x)}
return(x)}

##Problem 1
# Find the real root of the equation x^6 - x^4 - x^3 - 1 = 0
# between 1.4 and 1.5

# solution by bisection method


f1 <- function(x)x^6-x^4-x^3-1
curve(f1,1.4,1.5)
bisection(f = f1,lower = 1.4,upper = 1.5)
lower<-1.4

# solution by iteration method


# x = (1+x^3+x^4)^(1/6)
g1 <- function(x) (1+x^3+x^4)^(1/6)

iteration(f = g1,x0 = 1.45)

# solution by Newton Raphson method


f1 <- function(x)x^6-x^4-x^3-1
f11 <- function(x)6*x^5-4*x^3-3*x^2
NR(f=f1,f1=f11,x0=1.45)

## Problem 2 Find the root of the equation 2x = cos(x) + 3

# solution by bisection method


f2 <- function(x) cos(x)-2*x + 3
curve(f2,0,10)
bisection(f = f2,lower = 0,upper = 10)

# solution by iteration method


# x = (cos(x) + 3)/2
g2 <- function(x) (cos(x)+3)/2
iteration(f = g2,x0 = 5)

# solution by Newton Raphson method


f2 <- function(x) cos(x)-2*x + 3
f21 <- function(x) -sin(x)-2

37
NR(f=f2,f1=f21,x0=1.45)

## Problem 3 Find the root of the equation xlog(x,10)=4.77

# solution by bisection method


f3 <- function(x) x*log(x,10)-4.77
curve(f3,1,10)
bisection(f = f3,lower = 1,upper = 10)

# solution by iteration method


# x = 4.77/log(x,10)
g3 <- function(x) 4.77/log(x,10)
iteration(f = g3,x0 = 5)

# solution by Newton Raphson method


f3 <- function(x) x*log(x,10)-4.77
f31 <- function(x) log(x,10)+log(exp(1),10)
NR(f=f3,f1=f31,x0=5)

## Problem 4 Find the root of the equation cos(x) = xexp(x)

# solution by bisection method


f4 <- function(x) cos(x) - x*exp(x)
curve(f4,0,1)
bisection(f = f4,lower = 0,upper = 1)

# solution by iteration method


# x = cos(x)/exp(x)
g4 <- function(x) cos(x)/exp(x)
iteration(f = g4,x0 = 5)

# solution by Newton Raphson method


f4 <- function(x) cos(x) - x*exp(x)
f41 <- function(x) -sin(x) -x*exp(x)-exp(x)
NR(f=f4,f1=f41,x0=5)

# numerical integral of f from a to b

trapezoid <- function(f, a, b, n ) {

h <- (b-a)/n

x <- seq(a, b, by = h)

y <- sapply(x, f)

38
T <- h*(y[1]/2 + sum(y[2:n]) + y[n+1]/2)

return(T)

f1 <- function(x)

return(4 * x^3)

trapezoid(f1, 0, 1, n = 200)

######################################################################

# numerical integral of f from a to b

# using Simpson's rule with n subdivisions

# f is a function of a single variable

# we assume a < b and n is a positive even integer

simpson <- function(f, a, b, n ) {

n <- max(c(2*(n %/% 2), 4))

h <- (b-a)/n

x1<- seq(a+h, b-h, by = 2*h)

x2 <- seq(a+2*h, b-2*h, by = 2*h)

f1 <- sapply(x1, f)

f2 <- sapply(x2, f)

S <- h/3*(f(a) + f(b) + 4*sum(f1) + 2*sum(f2))

return(S)

f1 <- function(x) return(4 * x^3)

simpson(f1, 0, 1, 20)

##########################################

39
trapezoid <- function(fun, a, b, n=100) {

# numerical integral of fun from a to b

# using the trapezoid rule with n subdivisions

# assume a < b and n is a positive integer

h <- (b-a)/n

x <- seq(a, b, by=h)

y <- fun(x)

s <- h * (y[1]/2 + sum(y[2:n]) + y[n+1]/2)

return(s)

f1 <- function(x)

return(4 * x^3)

trapezoid (fun=f1, 0, 1, n = 20)

###################################

simpson <- function(f, a, b, n) {

# numerical integral using Simpson's rule

# assume a < b and n is an even positive integer

h <- (b-a)/n

x <- seq(a, b, by=h)

if (n == 2) {

s <- f(x[1]) + 4*f(x[2]) +f(x[3])

} else {

40
s <- f(x[1]) + f(x[n+1]) + 2*sum(f(x[seq(3,n-1,by=2)])) + 4 *sum(f(x[seq(2,n, by=2)]))

s <- s*h/3

return(s)

f1 <- function(x)

return(4 * x^3)

simpson(f=f1, 0, 1, n = 99)

####################################

Simpson 3/8

simpson3_8 <- function(fun, a, b, n) {

# numerical integral using Simpson's rule

# assume a < b and n is an even positive integer

h <- (b-a)/n

x <- seq(a, b, by=h)

if (n == 3) {

s <- fun(x[1]) + 3*fun(x[2]) +3*fun(x[3])+fun(x[3])

} else {

s <- fun(x[1]) + fun(x[n+1]) + 3*sum(fun(x[seq(2,(n),by=1)]))-sum(fun(x[seq(3,n, by=3)]))

+2*sum(fun(x[seq(4,n, by=3)]))

s <- s*3*h/8

return(s)

f1 <- function(x)

41
return(4 * x^3)

simpson3_8(fun=f1, 0, 1, n = 99)

Matrix Inverson Gauss Jordan method

Ab<-matrix(c(2,4,6,2,5,9,6,3,1,0,4,8),nrow=3)

n<-nrow(Ab)

Ab[1,]<-Ab[1,]/Ab[1,1]

Ab[2,]<-Ab[2,]-Ab[2,1]*Ab[1,]

Ab[3,]<-Ab[3,]-Ab[3,1]*Ab[1,]

Ab[2,]<-Ab[2,]/Ab[2,2]

Ab[3,]<-Ab[3,]-Ab[3,2]*Ab[2,]

Program

#####################################

Ab<-matrix(c(2,4,6,2,5,9,6,3,1,0,4,8),nrow=3)

m<-nrow(Ab)

n<-ncol(Ab)

for(j in 1:(m-1)){

Ab[j,]<-Ab[j,]/Ab[j,j]

for (i in (j+1):m)

Ab[i,]<-Ab[i,]-Ab[i,j]*Ab[j,]

Ab[m,]<-Ab[m,]/Ab[m,m]

Ab # The upper triangular matrix of the Gauss Jordan method

for(j in 2:m){

42
for(i in 1:(j-1))

Ab[i,]<-Ab[i,]-Ab[j,]*Ab[i,j]}

x<-rep(0,m)

for (i in 1:m)

x[i]<-Ab[i,n]

###############################################

MLE of Cauchy

n=100

u<-runif(100)

x<-tan(pi*(u-(1/2)))

logc<-function(theta,x) sum(-log(pi)-log(1+(x-theta)^2))

theta <- seq(0,10,,100)

plot(theta,sapply(theta,logc,x),type="l",ylab="LogLof Cauchy")

###

g <-function(theta)

sum(2*(x-theta)/(1+(x-theta)^2))

derg <- function(theta)

sum((2*(x-theta)^2-2)/(1+(x-theta)^2)^2)

newton2 <- function(fun, derf, x0, eps, nlim){

iter <- 0

repeat{

iter <- iter+1

if(iter > nlim){

cat(" Iteration Limit Exceeded: Current = ",iter,fill = T)

x1 <- NA

break

43
}

x1 <- x0 - fun(x0)/derf(x0)

if(abs(x0 - x1) < eps||abs(fun(x1))<1.0e-12)

break

x0 <- x1

cat("******Iter. No: ", iter, " Current Iterate =", x1,fill=T)

return(x1)

newton2(g,derg,2.0,.00001,100)

# Now teh data vector can be passed as an additional argument to newton3()

gg <- function(theta,x)sum(2*(x-theta)/(1+(x-theta)^2))

sum((2*(x-theta)^2-2)/(1+(x-theta)^2)^2)

x<-rcauchy(100)

newton3(gg, dergg, 2.0, 0.00001, 100, x)

#alternatively

optimize(function(theta)

sapply(theta,logc,x), c(0,10), maximum=T)

optimize(logc,interval=c(0,10),,,maximum=T,,x)

#Scaling

myexp():myexp <- function(x){

y <- abs(x); i <- 0; eps <- 1.e-10; sum <- 1

repeat{

i <- i+1

term <- y^i/factorial(i)

sum <- sum + term

if (term <= eps) break

44
}

if(x < 0) sum <- 1/sum

return(sum)

f<-function(x)

{return(1/(1+x^2))}

h<-0.1

a<-0

b<-1

x<-seq(a,b, by=h)

y<-f(x)

n<-(b-a)/h

coef=c(rep(1,1),rep(2,n-1),rep(1,1))

I=(h/2)*(sum(coef*y))

Simpson’s (1/3) rule

f<-function(x)

{return(1/(1+x^2))}

45
n=100

if(n%%2!=0){

print("n is not even")

}else{

a<-0

b<-1

h<-(b-a)/n

x<-seq(a,b, by=h)

y<-f(x)

coef=c(rep(1,1),rep(c(4,2),(n-2)/2),rep(c(4,1),1))

I=(h/3)*(sum(coef*y))

46

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy