R
R
5-8i
>y
> y<-3.5-8i
> re(y)+re(z)
> Re(y)+Re(z)
[1] 7
> Im(y)+Im(z)
[1] -16
> Mod(z)
[1] 8.732125
> Mod(Im(z))
[1] 8
> h<-tan(3/2)
>h
[1] 14.10142
>h
[1] 2.237161
>h
[1] -6.405331
> as.complex(4)
[1] 4+0i
> as.complex(4.0)
[1] 4+0i
> as.complex(Mod(Im(z)))
[1] 8+0i
> richot(8.8,1)
[1] 9
> richot(8.8,0)
[1] 8
[1] 11
> a= function(x)
+ x+2
> a(10)
[1] 12
> richot(9.7,1)
[1] 10
> a=2
> a(9)
>a<2
[1] FALSE
>a>2
[1] FALSE
>a>1
[1] TRUE
>a<1
[1] FALSE
> a>1
[1] TRUE
> a==1
[1] FALSE
> a%%2
[1] 0
> a%/%2
[1] 1
> x=c(1,2,3,4)
> is.integer(x)
[1] FALSE
> is.numeric(x)
[1] TRUE
> x=g(1,2,3,4,5)
> y=c(1,2,3,4,5,6,7,8,9,0,11,22,33,44,55,66,77,88,99,100,12,13,14,15,16,17,18,19,20)
>y
[1] 1 2 3 4 5 6 7 8 9 0 11 22 33 44 55 66 77 88 99 100 12 13 14
[24] 15 16 17 18 19 20
> y=integer(y)
> x=integer(x)
> x=c(1,2)
> is.integer(x)
[1] FALSE
>y
[1] 1 2 3 4 5 6 7 8 9 0 11 22 33 44 55 66 77 88 99 100 12 13 14
[24] 15 16 17 18 19 20
> y=(1,2,3,4,5,6,7,8,9,0,11,22,33,44,55,66,77,88,99,100,12,13,14)
> y=c(1,2,3,4,5,6,7,8,9,0,11,22,33,44,55,66,77,88,99,100,12,13,14)
>y
[1] 1 2 3 4 5 6 7 8 9 0 11 22 33 44 55 66 77 88 99 100 12 13 14
>
> y=c(khaskj,ahsja,kajsojoiaj,jsaojso,asiojsoiaj,oisjaoij,jasoijsoi,oisajij)
>
y=c("khaskj","khaskj","khewrewrewaskj","khasrtwtwwtkj","khaskqqewqeqwj","khaskjryhryhry","khaskje
feaf")
>y
> class(y)
[1] "character"
>
y=r(c("khaskj","khaskj","khewrewrewaskj","khasrtwtwwtkj","khaskqqewqeqwj","khaskjryhryhry","khask
jefeaf"))
>y
[7] khaskjefeaf
> class(y)
[1] "factor"
> isClass(y)
> is.Class(y)
> is.Class
> mode(y)
[1] "numeric"
Error: '\U' used without hex digits in character string starting ""C:\U"
> 119 %% 10
[1] 9
[1] FALSE
[1] FALSE
>
>1+1
[1] 2
> example(min)
>help(min)
> list.files()
> source("bottle1.R")
> c(1,TRUE,"THREE")
Sequence vectors
>5:9
56789
> seq(5,9)
[1] 5 6 7 8 9
> seq(5,9,0.5)
[1] 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0
Vector Accesss
We're going to create a vector with some strings in it for you, and store it in the sentence variable.
You can retrieve an individual value within a vector by providing its numeric index in square brackets.
> sentence[3]
[1] "plank"
You can assign new values within an existing vector. Try changing the third word to "dog":
> sentence[3]="dog"
If you add new values onto the end, the vector will grow to accommodate them. Let's add a fourth
word:
> sentence[4]="to"
You can use a vector within the square brackets to access multiple values. Try getting the first and third
words:
> sentence[c(1,3)]
> sentence[2:4]
You can also set ranges of values; just provide the values in a vector. Add words 5 through 7:
> sentence[5:7]=c("the","poop","deck")
Vector Names
You can assign names to a vector's elements by passing a second vector filled with names to the names
assignment function, like this:
> ranks <- 1:3
barplot
We'll make a new vector for you, and store it in the vesselsSunk variable.
>vesselsSunk=1:3
barplot(vesselsSunk)
> vesselsSunk <- c(4, 5, 1) or 5:7 //takes y cord as 4,5,1 for bar or takes y cod a s5,6,7, for bar
> barplot(vesselsSunk)
Vector Math
> a=7:9
> a+1
[1] 8 9 10
>a
[1] 7 8 9
> a=a+1
>a
[1] 8 9 10
>a/2
>a*2
[1] 16 18 20
> a+(a*2)
[1] 24 27 30
>a-(a*2)
[1] -8 -9 -10
> c(99,99,99)<a
> sin(a)
> sqrt(a)
Scatter Plots
The plot function takes two vectors, one for X values and one for Y values, and draws a graph of them.
Let's draw a graph showing the relationship of numbers and their sines.
First, we'll need some sample data. We'll create a vector for you with some fractional values between 0
and 20, and store it in the x variable.
Now, try creating a second vector with the sines of those values:
> plot(x, y)
We'll create a vector with some negative and positive values for you, and store it in the values variable.
We'll also create a second vector with the absolute values of the first, and store it in the absolutes
variable.
Try plotting the vectors, with values on the horizontal axis, and absolutes on the vertical axis.
Sometimes, when working with sample data, a given value isn't available. But it's not a good idea to just
throw those values out. R has a value that explicitly indicates a sample was not available: NA. Many
functions that work with vectors treat this value specially.
We'll create a vector for you with a missing sample, and store it in the a variable.
Try to get the sum of its values, and see what the result is:
> sum(a)
[1] NA
The sum is considered "not available" by default because one of the vector's values was NA. This is the
responsible thing to do; R won't just blithely add up the numbers without warning you about the
incomplete data. We can explicitly tell sum (and many other functions) to remove NA values before they
do their calculations, however.
As you see in the documentation, sum can take an optional named argument, na.rm. It's set to FALSE by
default, but if you set it to TRUE, all NA arguments will be removed from the vector before the
calculation is performed.
[1]20
Riley third chapter
MATRICES
So far we've only worked with vectors, which are simple lists of values. What if you need data in rows
and columns? Matrices are here to help.
A matrix is just a fancy term for a 2-dimensional array. In this chapter, we'll show you all the basics of
working with matrices, from creating them, to accessing them, to plotting them.
Let's make a matrix 3 rows high by 4 columns wide, with all its fields set to 0.
> matrix(0, 3, 4)
[1,] 0 0 0 0
[2,] 0 0 0 0
[3,] 0 0 0 0
> matrix(3,4)
[,1]
[1,] 3
[2,] 3
[3,] 3
[4,] 3
[1,] 0 0 0 0
[2,] 0 0 0 0
[3,] 0 0 0 0
> matrix(a,3,4)
[1,] 8 8 8 8
[2,] 9 9 9 9
[3,] 10 10 10 10
> matrix(b,3,4)
> matrix('b',3,4)
You can also use a vector to initialize a matrix's value. To fill a 3x4 matrix, you'll need a 12-item vector.
We'll make that for you now:
> a=1:12
If we print the value of a, we'll see the vector's values, all in a single row:
> print(a)
[1] 1 2 3 4 5 6 7 8 9 10 11 12
Now call matrix with the vector, the number of rows, and the number of columns:
> matrix(a, 3, 4)
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
The vector's values are copied into the new matrix, one by one. You can also re-shape the vector itself
into a matrix. Create an 8-item vector:
The dim assignment function sets dimensions for a matrix. It accepts a vector with the number of rows
and the number of columns to assign.
Assign new dimensions to plank by passing a vector specifying 2 rows and 4 columns (c(2, 4)):
If you print plank now, you'll see that the values have shifted to form 2 rows by 4 columns:
[1,] 1 3 5 7
[2,] 2 4 6 8
Matrix Acesss
Getting values from matrices isn't that different from vectors; you just have to provide two indices
instead of one.
Let's take another look at our plank matrix:
> print(plank)
[1,] 1 3 5 7
[2,] 2 4 6 8
Try getting the value from the second row in the third column of plank:
> plank[2, 3]
[1] 6
Now, try getting the value from first row of the fourth column:
> plank[1,4]
[1] 7
As with vectors, to set a single value, just assign to it. Set the previous value to 0:
You can get an entire row of the matrix by omitting the column index (but keep the comma). Try
retrieving the second row:
> plank[2,]
[1] 2 4 6 8
To get an entire column, omit the row index. Retrieve the fourth column:
> plank[, 4]
[1] 7 8
You can read multiple rows or columns by providing a vector or sequence with their indices. Try
retrieving columns 2 through 4:
[1,] 3 5 7
[2,] 4 6 8
MATRIX PLOTTING
Text output is only useful when matrices are small. When working with more complex data, you'll need
something better. Fortunately, R includes powerful visualizations for matrix data.
It's pretty flat - everything is 1 meter above sea level. We'll create a 10 by 10 matrix with all its values
initialized to 1 for you:
Oh, wait, we forgot the spot where we dug down to sea level to retrieve a treasure chest. At the fourth
row, sixth column, set the elevation to 0:
You can now do a contour map of the values simply by passing the matrix to the contour function:
> contour(elevation)
> elevation
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 1 1 1 1
[2,] 1 1 1 1 1 1 1 1 1 1
[3,] 1 1 1 1 1 1 1 1 1 1
[4,] 1 1 1 1 1 0 1 1 1 1
[5,] 1 1 1 1 1 1 1 1 1 1
[6,] 1 1 1 1 1 1 1 1 1 1
[7,] 1 1 1 1 1 1 1 1 1 1
[8,] 1 1 1 1 1 1 1 1 1 1
[9,] 1 1 1 1 1 1 1 1 1 1
[10,] 1 1 1 1 1 1 1 1 1 1
>persp(elevation)
The perspective plot looks a little odd, though. This is because persp
automatically expands the view so that your highest value (the beach surface) is
at the very top.
We can fix that by specifying our own value for the expand parameter.
>persp(elevation,expand=0.2)
Okay, those examples are a little simplistic. Thankfully, R includes some sample
data sets to play around with. One of these is volcano, a 3D map of a dormant
New Zealand volcano.
It's simply an 87x61 matrix with elevation values, but it shows the power of R's
matrix visualizations.
>contour(volcano)
Try a perspective plot (limit the vertical expansion to one-fifth again):
Determining the health of the crew is an important part of any inventory of the
ship. Here's a vector containing the number of limbs each member has left, along
with their names.
Standard deviation
> pounds <- c(45000, 50000, 35000, 40000, 35000, 45000, 10000,
15000)
> barplot(pounds)
> meanValue <- mean(pounds)
Let's see a plot showing the mean value:
1. Some of the plunder from our recent raids has been worth less than what we're used to.
Here's a vector with the values of our latest hauls:
2. > pounds <- c(45000, 50000, 35000, 40000, 35000, 45000, 10000, 15000)
3. > barplot(pounds)
o 01000020000300004000050000
5. Statisticians use the concept of "standard deviation" from the mean to describe the range
of typical values for a data set. For a group of numbers, it shows how much they typically
vary from the average value. To calculate the standard deviation, you calculate the mean
of the values, then subtract the mean from each number and square the result, then
average those squares, and take the square root of that average.
If that sounds like a lot of work, don't worry. You're using R, and all you have to do is
pass a vector to the sd function. Try calling sd on the pounds vector now, and assign the
result to the deviation variable:
RedoComplete
RedoComplete
> abline(h = meanValue + deviation)
Chapter 5 FACTORS
Creating factors
Often your data needs to be grouped by category: blood pressure by age range,
accidents by auto manufacturer, and so forth. R has a special collection type
called a factor to track these categorized values
It's time to take inventory of the ship's hold. We'll make a vector for you with the
type of booty in each chest.
To categorize the values, simply pass the vector to the factor function:
1.
1. Printed at the bottom, you'll see the factor's "levels" - groups of unique values. Notice
also that there are no quotes around the values. That's because they're not strings; they're
actually integer references to one of the factor's levels.
2. Let's take a look at the underlying integers. Pass the factor to the as.integer function:
> as.integer(types)
[1] 2 3 1 2 1
3. You can get only the factor levels with the levels function:
> levels(types)
[1] "gems" "gold" "silver"
1. You can use a factor to separate plots into categories. Let's graph our five chests by
weight and value, and show their type as well. We'll create two vectors for
you; weights will contain the weight of each chest, and prices will track how much
the chests are worth.
Now, try calling plot to graph the chests by weight and value.
> weights <- c(300, 200, 100, 250, 150)
3. The legend function takes a location to draw in, a vector with label names, and a vector
with numeric plot character IDs.
> legend("topright", c("gems", "gold", "silver"), pch=1:3)
4. If you hard-code the labels and plot characters, you'll have to update them every time you
change the plot factor. Instead, it's better to derive them by using the levels function on
your factor:
> legend("topright", levels(types), pch=1:length(levels(types)))
CHAPTER 6 DATA FRAMES
The weights, prices, and types data structures are all deeply tied together, if you think about it. If
you add a new weight sample, you need to remember to add a new price and type, or risk
everything falling out of sync. To avoid trouble, it would be nice if we could tie all these
variables together in a single data structure.
Fortunately, R has a structure for just this purpose: the data frame. You can think of a data frame
as something akin to a database table or an Excel spreadsheet. It has a specific number of
columns, each of which is expected to contain values of a particular type. It also has an
indeterminate number of rows - sets of related values for each column.
Our vectors with treasure chest data are perfect candidates for conversion to a data frame. And
it's easy to do. Call the data.frame function, and pass weights, prices, and types as the arguments.
Assign the result to the treasure variable:
> print(treasure)
1. Just like matrices, it's easy to access individual portions of a data frame.
You can get individual columns by providing their index number in double-brackets. Try
getting the second column (prices) of treasure:
> treasure[[2]]
[1] 9000 5000 12000 7500 18000
2. You could instead provide a column name as a string in double-brackets. (This is often
more readable.) Retrieve the "weights" column:
> treasure[["weights"]]
> treasure[1,2]
[1] 9000
> treasure[1]
weights
1 300
2 200
3 100
4 250
5 150
> treasure[,1]
> treasure[1,]
1. Typing in all your data by hand only works up to a point, obviously, which is why R was
given the capability to easily load data in from external files.
"Port","Population","Worth"
"Cartagena",35000,10000
"Porto Bello",49000,15000
"Havana",140000,50000
"Panama City",105000,35000
You can load a CSV file's content into a data frame by passing the file name to
the read.csv function. Try it with the "targets.csv" file:
> read.csv("targets.csv")
5. Cartagena 500
7. Havana 2000
For files that use separator strings other than commas, you can use
the read.table function. The sep argument defines the separator character, and you
can specify a tab character with "\t".
Call read.table on "infantry.txt", using tab separators:
> read.table("infantry.txt", sep="\t")
V1 V2
1 Port Infantry
3 Cartagena 500
5 Havana 2000
8. Notice the "V1" and "V2" column headers? The first line is not automatically treated as
column headers with read.table. This behavior is controlled by the header argument.
Call read.table again, setting header to TRUE:
> read.table("infantry.txt", sep="\t", header=TRUE)
Port Infantry
2 Cartagena 500
We want to loot the city with the most treasure and the fewest guards. Right now,
though, we have to look at both files and match up the rows. It would be nice if all
the data for a port were in one place...
R's merge function can accomplish precisely that. It joins two data frames
together, using the contents of one or more columns. First, we're going to store
those file contents in two data frames for you, targets and infantry.
The merge function takes arguments with an x frame (targets) and a y frame
(infantry). By default, it joins the frames on columns with the same name (the
two Port columns). See if you can merge the two frames:
RedoComplete
1. So far, we've been working purely in the abstract. It's time to take a look at some real
data, and see if we can make any observations about it.
Country,Piracy
Australia,23
Bangladesh,90
Brunei,67
China,77
...
We'll load that into the piracy data frame for you:
> piracy <- read.csv("piracy.csv")
We also have another file with GDP per capita for each country (wealth produced,
divided by population):
1 Liechtenstein 141100
2 Qatar 104300
3 Luxembourg 81100
4 Bermuda 69900
...
That will go into the gdp frame:
> gdp <- read.table("gdp.txt", sep=" ", header=TRUE)
We'll merge the frames on the country names:
R can test for correlation between two vectors with the cor.test function. Try calling it on
the GDP and Piracy columns of the countries data frame:
RedoComplete
-0.8736179 -0.7475690
sample estimates:
cor
-0.8203183
The key result we're interested in is the "p-value". Conventionally, any correlation with a
p-value less than 0.05 is considered statistically significant, and this sample data's p-value
is definitely below that threshold. In other words, yes, these data do show a statistically
significant negative correlation between GDP and software piracy.
4. We have more countries represented in our GDP data than we do our piracy rate data. If
we know a country's GDP, can we use that to estimate its piracy rate?
We can, if we calculate the linear model that best represents all our data points (with a
certain degree of error). The lm function takes a model formula, which is represented by
a response variable (piracy rate), a tilde character (~), and a predictor variable (GDP).
(Note that the response variable comes first.)
Try calculating the linear model for piracy rate by GDP, and assign it to
the line variable:
RedoComplete
RedoComplete
> abline(line)
Now, if we know a country's GDP, we should be able to make a reasonable prediction of
how common piracy is there!
6. ggplot27.2
The functionality we've shown you so far is all included with R by default. (And it's
pretty powerful, isn't it?) But in case the default installation doesn't include that function
you need, there are still more libraries available on the servers of the Comprehensive R
Archive Network, or CRAN. They can add anything from new statistical functions to
better graphics capabilities. Better yet, installing any of them is just a command away.
> install.packages("ggplot2")
--- Please select a CRAN mirror for use in this session ---
opened URL
==================================================
downloaded 2.2 Mb
* installing *source* package 'ggplot2' ...
** R
** data
** inst
** help
* DONE (ggplot2)
7. You can get help for a package by calling the help function and passing the package
name in the package argument. Try displaying help for the "ggplot2" package:
RedoComplete
Description:
Package: ggplot2
Type: Package
Version: 0.9.1
...
8. Here's a quick demo of the power you've just added to R. To use it, let's revisit some data
from a previous chapter.
9. > weights <- c(300, 200, 100, 250, 150)
RedoComplete
ggplot2 is just the first of many powerful packages awaiting discovery on CRAN. And
of course, there's much, much more functionality in the standard R libraries. This course
has only scratched the surface!