Howtouser: 1 What Is R
Howtouser: 1 What Is R
Howtouser: 1 What Is R
1 What is R
R is open domain statistical software based on the language S. Open domain
language S was originally written at Bell-Labs. S-Plus is a commercial sta-
tistical software also based on S. BOTH S-Plus and R are also programming
languages and hence allow one to do innovative things with the data.
R is open domain and downloadable from the CRAN site at (size about 24 MB
for Windows version)
http://cran.r-project.org/
The current version is 2.4.1. Make sure to install the reference manual.
The site can be used to download many other packages that address particular
statistical methodologies. However, the packages included in the basic download
are extensive enough for all but the most specialized user. Even though it was
written by statisticians for use in statistics, it can be used for many other
functions as well. It has superior graphical facility.
S-Plus is more user-friendly, more buggy, more illogical and more expensive
than R. The two are similar but not the same. In R, many more things need
to be done with commands rather than with a drop down menu as in S-Plus.
Both programs are unix based at heart though versions of both are available
for Windows, Unix, and the Mac.
2 Getting Started
Double click on the R icon to load the program and use
File → Exit
to quit. Make sure to choose the option ”No” when it asks whether to
save workspace image or not. If any newly created objects are saved once,
they would need to be deleted manually later. Command objects() will show
all objects in the working directory and rm(x,y) will remove objects x,y from
the working directory. This command will NOT remove objects that
got saved while exiting the previous time. In order to remove those
objects,
1
c → Program Files → R → R commands window.
and manually delete it.
R does not automatically load all available commands. It loads only the
commands of the basic stat and graphics packages that get loaded automatically.
Various other commands are available in different packages. Most common
users will need nothing more than what gets loaded automatically. Some other
packages besides the basic ones, also get downloaded along with R, but stay in
the background. They can be brought forward with
Packages → Load package → MASS
will load the package MASS from the list. Each time the R session is closed, the
added packages disappear from the working directory. A loaded package can be
detached without closing the session with
detach(package:MASS).
Many other packages are available from the site for downloads. These are
contributed by various interested parties. They are usually targeted for a par-
ticular statistical methodology. For example the package boot has bootstrap
functions. At last count, the list of available bundles and packages included 36
items in ”a”!
3 Help
• Help in R gives a sub-menu that contains FAQ on R, FAQ on R for Windows
as well as Introduction to R and the Manuals.
will load the Acrobat Reader. Explanation of the command may appear
somewhat cryptic initially, but it gives examples and hence is quite useful.
• One can also use e.g. help.search("time series") to get information
on the topic of time series. It will list number of commands used in time
series analysis along with the name of the package in which the commands
exist. (It is not an exhaustive list.) Command ?cor can be used to produce
information on how to compute the correlation coefficient because cor(,)
is a valid S command. Command ?regression will give an error message
because regression is not a valid S command.
2
goes by so the command hist, which draws histograms of a vector of observa-
tions, will also work on numeric matrices, but will freak out if there are headers
to the columns and hence it is not numeric.
Commands such as
is.matrix(m)
will say TRUE if the object m is a matrix and FALSE if it is not. The command
data.class(m) will give result "matrix" if m is a matrix. Data can be forced
into other forms than its own under certain circumstances. For example, if m
is a numeric matrix, as.vector(m) will make a vector of the matrix entries
reading column-wise. A particular command may not work because the data is
in a wrong format.
5 Creating Data
• Easiest type of data to create is a vector. For example,
x ← c(1,4,2,6).
• Command
x ← scan(file="")
will prompt for input. Enter each number. Clicking Enter twice will end
the data input.
• Command
m ← matrix(1:16,ncol=4)
3
colnames(test) ← c("Age","Wealth").
Once an object named test exists,
edit(test)
will bring it up and one can edit the cells. However, if the edited version
needs to be saved, reassign it using
test ← edit(test).
and then close the edited spread sheet. It is possible to invoke various
other editors for the spread sheet. Find help with ?edit.
The data set can be exported with the command
write.table(test,"c:/yash/newdat/test.txt").
7 Import Data
One needs to use different protocols for importing different types of files. Easiest
to import is a .txt file that follows R protocol.
read.table("c:/yash/newdat/test.txt")
will import file test.txt (the extension .txt is necessary) from the address
stated. The option header=T MUST be used if and only if the columns
have names. Otherwise the data imports incorrectly. Ironically, and blissfully,
R is NOT case sensitive in the command read.table(). THIS IS THE
ONLY PLACE WHERE IT IS NOT.
4
9 Graphics
Some demos of graphs from the base package can be viewed with command
demo(graphics). All graphs need to be drawn with commands and each graph-
ics command has plenty of options to fine tune one’s graph. A graphics screen
can be split by par(mfrow=c(2,3)). Successive graphs will be filled by row.
Existing graphs can be augmented with commands such as lines(x,y)
(which draws a line with intercept x and slope y) or abline(h=c(y1,y2,y3))
(which draws three horizontal lines at y = y1, y2, and y3) etc. Text, symbols
and legends can be inserted in an existing graph. Commands locator(), or
identify() will make an active cursor in the existing graph and by succes-
sively clicking on desired points, we can find the co-ordinates of those points.
Commonly used graphs can be drawn with commands plot(x,y), hist(x),boxplot(air)
etc.
11 Standard Distributions
For a list of available probability distributions and their R names, see table on
page 37 of the Introduction to R.
Help → Manuals
→ An introduction to R
→ Probability Distributions
will get the list. For each one of these distributions the probability density,
distribution function, quantiles, and simulation of the distribution is obtained
by adding respectively d, p, q and r behind the R name of the distribution.
The table also gives the parameters of these distributions. Any arbitrary discrete
distribution can be simulated with sample(x,50,replace=T,prob=p).
12 Programming in R
The greatest payoff of R is that it is also a programming language. Small pro-
grams can be written in a Notepad file and then run in the working directory
5
by copying and pasting the program. For example, the function meanunif sim-
ulates nU (0, 1) random variables, computes its mean, and repeats the process
N times.
meanunif <-function(n,N){
x<-1:N
for (i in 1:N){x[i]<-mean(runif(n))}
x
}