0% found this document useful (0 votes)

126 views

R

The document contains examples of working with matrices and vectors in R. It shows how to create matrices with specified dimensions and values, access elements within matrices using row and column indices, perform basic math operations on matrices and vectors, and plot vectors and matrices. It also demonstrates using NA values and removing them when calculating statistics.

Uploaded by

ankit_jolly_2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

126 views

R

Uploaded by

ankit_jolly_2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 38

> z=3.

5-8i

Error: object 'y' not found

> y < - 3.5-8i

Error: object 'y' not found

> y <- 3.5-8i

> y<-3.5-8i

> re(y)+re(z)

Error: could not find function "re"

> Re(y)+Re(z)

[1] 7

> Im(y)+Im(z)

[1] -16

> Mod(z)

[1] 8.732125

> Mod(Im(z))

[1] 8

> h<-tan(3/2)

[1] 14.10142

> h<- tan(1200/60)

[1] 2.237161

> h<- tan(1800/60)

[1] -6.405331

> as.complex(4)

[1] 4+0i

> as.complex(4.0)
[1] 4+0i

> as.complex(Mod(Im(z)))

[1] 8+0i

> richot= function(x,y) if(y=1) ceiling(x) else floor(x)

Error: unexpected '=' in "richot= function(x,y) if(y="

> richot<- function(x,y) if(y=1) ceiling(x) else floor(x)

Error: unexpected '=' in "richot<- function(x,y) if(y="

> richot<- function(x,y) if(y==1) ceiling(x) else floor(x)

> richot(8.8,1)

[1] 9

> richot(8.8,0)

[1] 8

> 119 %/% 10

[1] 11

> a= function(x)

+ x+2

> a(10)

[1] 12

> richot(9.7,1)

[1] 10

> a=2

> a(9)

Error: could not find function "a"

>a<2

[1] FALSE

>a>2

[1] FALSE
>a>1

[1] TRUE

>a<1

[1] FALSE

> a>1

[1] TRUE

> a==1

[1] FALSE

> a%%2

[1] 0

> a%/%2

[1] 1

> x=c(1,2,3,4)

> x=c(1,2,3,4) is integer.x

Error: unexpected symbol in "x=c(1,2,3,4) is"

> x=c(1,2,3,4) is.integer(x)

Error: unexpected symbol in "x=c(1,2,3,4) is.integer"

> x=1 is.integer(x)

Error: unexpected symbol in "x=1 is.integer"

> is.integer(x)

[1] FALSE

> is.numeric(x)

[1] TRUE

> x=g(1,2,3,4,5)

Error: could not find function "g"

> y=c(1,2,3,4,5,6,7,8,9,0,11,22,33,44,55,66,77,88,99,100,12,13,14,15,16,17,18,19,20)

[1] 1 2 3 4 5 6 7 8 9 0 11 22 33 44 55 66 77 88 99 100 12 13 14

[24] 15 16 17 18 19 20
> y=integer(y)

Error in integer(y) : invalid 'length' argument

> x=integer(x)

Error in integer(x) : invalid 'length' argument

> x=c(1,2)

> is.integer(x)

[1] FALSE

[1] 1 2 3 4 5 6 7 8 9 0 11 22 33 44 55 66 77 88 99 100 12 13 14

[24] 15 16 17 18 19 20

> y=(1,2,3,4,5,6,7,8,9,0,11,22,33,44,55,66,77,88,99,100,12,13,14)

Error: unexpected ',' in "y=(1,"

> y=c(1,2,3,4,5,6,7,8,9,0,11,22,33,44,55,66,77,88,99,100,12,13,14)

[1] 1 2 3 4 5 6 7 8 9 0 11 22 33 44 55 66 77 88 99 100 12 13 14

> y=c(khaskj,ahsja,kajsojoiaj,jsaojso,asiojsoiaj,oisjaoij,jasoijsoi,oisajij)

Error: object 'khaskj' not found

>
y=c("khaskj","khaskj","khewrewrewaskj","khasrtwtwwtkj","khaskqqewqeqwj","khaskjryhryhry","khaskje
feaf")

[1] "khaskj" "khaskj" "khewrewrewaskj" "khasrtwtwwtkj" "khaskqqewqeqwj"

[6] "khaskjryhryhry" "khaskjefeaf"

> class(y)

[1] "character"

>
y=r(c("khaskj","khaskj","khewrewrewaskj","khasrtwtwwtkj","khaskqqewqeqwj","khaskjryhryhry","khask
jefeaf"))

Error: could not find function "r"

>
y=factor(c("khaskj","khaskj","khewrewrewaskj","khasrtwtwwtkj","khaskqqewqeqwj","khaskjryhryhry","
khaskjefeaf"))

[1] khaskj khaskj khewrewrewaskj khasrtwtwwtkj khaskqqewqeqwj khaskjryhryhry

[7] khaskjefeaf

Levels: khaskj khaskjefeaf khaskjryhryhry khaskqqewqeqwj khasrtwtwwtkj khewrewrewaskj

> class(y)

[1] "factor"

> isClass(y)

Error in .getClassFromCache(Class, where, package = package, resolve.msg = resolve.msg) :

class should be either a character-string name or a class definition

> is.Class(y)

Error: could not find function "is.Class"

> is.Class

Error: object 'is.Class' not found

> mode(y)

[1] "numeric"

> data= read.table(C:\Users\Akki\Desktop\rsample.xslx)

Error: unexpected input in "data= read.table(C:\"

> data= read.table("C:\Users\Akki\Desktop\rsample.xslx",header=T)

Error: '\U' used without hex digits in character string starting ""C:\U"

> 119 %% 10

[1] 9

> 119 %/% 10 ==0

[1] FALSE

> 119 %/% 10 ==9

[1] FALSE

> 119 %% 10 ==9

[1] TRUE

Riley first chapter

>1+1

[1] 2

> "Arr, matey!"

[1] "Arr, matey!"

> example(min)

>help(min)

> rep("Yo ho!", times = 3)

> list.files()

[1] "bottle1.R" "bottle2.R"

> source("bottle1.R")

[1] "This be a message in a bottle1.R!"

Riley second chapter

Vectors

> c(3,4,5) // vector storing 3 value of same type

> c("f",1,"1") // will sore 1 as "1" a charecter

> c(1,TRUE,"THREE")

[1] "1" "TRUE" "THREE"

Sequence vectors

>5:9

56789

> seq(5,9)

[1] 5 6 7 8 9

> seq(5,9,0.5)

[1] 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0

Vector Accesss

We're going to create a vector with some strings in it for you, and store it in the sentence variable.

You can retrieve an individual value within a vector by providing its numeric index in square brackets.

Try getting the third value:

> sentence <- c('walk', 'the', 'plank')

> sentence[3]

[1] "plank"

You can assign new values within an existing vector. Try changing the third word to "dog":
> sentence[3]="dog"

If you add new values onto the end, the vector will grow to accommodate them. Let's add a fourth
word:

> sentence[4]="to"

You can use a vector within the square brackets to access multiple values. Try getting the first and third
words:

> sentence[c(1,3)]

[1] "walk" "dog"

> sentence[2:4]

[1] "the" "dog" "to"

You can also set ranges of values; just provide the values in a vector. Add words 5 through 7:

> sentence[5:7]=c("the","poop","deck")

Vector Names

You can assign names to a vector's elements by passing a second vector filled with names to the names
assignment function, like this:
> ranks <- 1:3

> names(ranks) <- c("first", "second", "third")

Plotting one vector

barplot

The barplot function draws a bar chart with a vector's values.

We'll make a new vector for you, and store it in the vesselsSunk variable.

>vesselsSunk=1:3

barplot(vesselsSunk)

> vesselsSunk <- c(4, 5, 1) or 5:7 //takes y cord as 4,5,1 for bar or takes y cod a s5,6,7, for bar

> barplot(vesselsSunk)

>names(vesselsSunk) <- c("England", "France", "Norway")

Vector Math

> a=7:9

> a+1

[1] 8 9 10

[1] 7 8 9
> a=a+1

[1] 8 9 10

>a/2

[1] 4.0 4.5 5.0

>a*2

[1] 16 18 20

> a+(a*2)

[1] 24 27 30

>a-(a*2)

[1] -8 -9 -10

> c(99,99,99)<a

[1] FALSE FALSE FALSE

> sin(a)

[1] 0.8414710 0.9092974 0.1411200

> sqrt(a)

[1] 1.000000 1.414214 1.732051

Scatter Plots
The plot function takes two vectors, one for X values and one for Y values, and draws a graph of them.

Let's draw a graph showing the relationship of numbers and their sines.

First, we'll need some sample data. We'll create a vector for you with some fractional values between 0
and 20, and store it in the x variable.

Now, try creating a second vector with the sines of those values:

> x <- seq(1, 20, 0.1)

> y <- sin(x)

Then simply call plot with your two vectors:

> plot(x, y)

We'll create a vector with some negative and positive values for you, and store it in the values variable.

We'll also create a second vector with the absolute values of the first, and store it in the absolutes
variable.

Try plotting the vectors, with values on the horizontal axis, and absolutes on the vertical axis.

> values <- -10:10

> absolutes <- abs(values)

> plot(values, absolutes)

NA Values

Sometimes, when working with sample data, a given value isn't available. But it's not a good idea to just
throw those values out. R has a value that explicitly indicates a sample was not available: NA. Many
functions that work with vectors treat this value specially.

We'll create a vector for you with a missing sample, and store it in the a variable.

Try to get the sum of its values, and see what the result is:

> a <- c(1, 3, NA, 7, 9)

> sum(a)

[1] NA

The sum is considered "not available" by default because one of the vector's values was NA. This is the
responsible thing to do; R won't just blithely add up the numbers without warning you about the
incomplete data. We can explicitly tell sum (and many other functions) to remove NA values before they
do their calculations, however.

As you see in the documentation, sum can take an optional named argument, na.rm. It's set to FALSE by
default, but if you set it to TRUE, all NA arguments will be removed from the vector before the
calculation is performed.

Try calling sum again, with na.rm set to TRUE:

>sum(a, na.rm = TRUE)

[1]20
Riley third chapter

MATRICES

So far we've only worked with vectors, which are simple lists of values. What if you need data in rows
and columns? Matrices are here to help.

A matrix is just a fancy term for a 2-dimensional array. In this chapter, we'll show you all the basics of
working with matrices, from creating them, to accessing them, to plotting them.

Let's make a matrix 3 rows high by 4 columns wide, with all its fields set to 0.

> matrix(0, 3, 4)

[,1] [,2] [,3] [,4]

[1,] 0 0 0 0

[2,] 0 0 0 0

[3,] 0 0 0 0

> matrix(3,4)

[,1]

[1,] 3

[2,] 3

[3,] 3

[4,] 3

> matrix(0,3,4) //first represents value,no of rows,no of columns

[,1] [,2] [,3] [,4]

[1,] 0 0 0 0

[2,] 0 0 0 0

[3,] 0 0 0 0

> matrix(a,3,4)

[,1] [,2] [,3] [,4]

[1,] 8 8 8 8

[2,] 9 9 9 9

[3,] 10 10 10 10

> matrix(b,3,4)

Error in matrix(b, 3, 4) : object 'b' not found

> matrix('b',3,4)

[,1] [,2] [,3] [,4]

[1,] "b" "b" "b" "b"

[2,] "b" "b" "b" "b"

[3,] "b" "b" "b" "b"

You can also use a vector to initialize a matrix's value. To fill a 3x4 matrix, you'll need a 12-item vector.
We'll make that for you now:

> a=1:12

If we print the value of a, we'll see the vector's values, all in a single row:

> print(a)

[1] 1 2 3 4 5 6 7 8 9 10 11 12

Now call matrix with the vector, the number of rows, and the number of columns:
> matrix(a, 3, 4)

[,1] [,2] [,3] [,4]

[1,] 1 4 7 10

[2,] 2 5 8 11

[3,] 3 6 9 12

The vector's values are copied into the new matrix, one by one. You can also re-shape the vector itself
into a matrix. Create an 8-item vector:

> plank <- 1:8

The dim assignment function sets dimensions for a matrix. It accepts a vector with the number of rows
and the number of columns to assign.

Assign new dimensions to plank by passing a vector specifying 2 rows and 4 columns (c(2, 4)):

> dim(plank) <- c(2, 4)

If you print plank now, you'll see that the values have shifted to form 2 rows by 4 columns:

> print(plank) //or plank will display same output

[,1] [,2] [,3] [,4]

[1,] 1 3 5 7

[2,] 2 4 6 8

Matrix Acesss

Getting values from matrices isn't that different from vectors; you just have to provide two indices
instead of one.
Let's take another look at our plank matrix:

> print(plank)

[,1] [,2] [,3] [,4]

[1,] 1 3 5 7

[2,] 2 4 6 8

Try getting the value from the second row in the third column of plank:

> plank[2, 3]

[1] 6

Now, try getting the value from first row of the fourth column:

> plank[1,4]

[1] 7

As with vectors, to set a single value, just assign to it. Set the previous value to 0:

> plank[1, 4] <- 0

You can get an entire row of the matrix by omitting the column index (but keep the comma). Try
retrieving the second row:

> plank[2,]

[1] 2 4 6 8

To get an entire column, omit the row index. Retrieve the fourth column:

> plank[, 4]
[1] 7 8

You can read multiple rows or columns by providing a vector or sequence with their indices. Try
retrieving columns 2 through 4:

> plank[, 2:4]

[,1] [,2] [,3]

[1,] 3 5 7

[2,] 4 6 8

MATRIX PLOTTING

Text output is only useful when matrices are small. When working with more complex data, you'll need
something better. Fortunately, R includes powerful visualizations for matrix data.

We'll start simple, with an elevation map of a sandy beach.

It's pretty flat - everything is 1 meter above sea level. We'll create a 10 by 10 matrix with all its values
initialized to 1 for you:

> elevation <- matrix(1, 10, 10)

Oh, wait, we forgot the spot where we dug down to sea level to retrieve a treasure chest. At the fourth
row, sixth column, set the elevation to 0:

> elevation[4, 6] <- 0

You can now do a contour map of the values simply by passing the matrix to the contour function:
> contour(elevation)

> elevation
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 1 1 1 1
[2,] 1 1 1 1 1 1 1 1 1 1
[3,] 1 1 1 1 1 1 1 1 1 1
[4,] 1 1 1 1 1 0 1 1 1 1
[5,] 1 1 1 1 1 1 1 1 1 1
[6,] 1 1 1 1 1 1 1 1 1 1
[7,] 1 1 1 1 1 1 1 1 1 1
[8,] 1 1 1 1 1 1 1 1 1 1
[9,] 1 1 1 1 1 1 1 1 1 1
[10,] 1 1 1 1 1 1 1 1 1 1
>persp(elevation)

The perspective plot looks a little odd, though. This is because persp
automatically expands the view so that your highest value (the beach surface) is
at the very top.

We can fix that by specifying our own value for the expand parameter.

>persp(elevation,expand=0.2)
Okay, those examples are a little simplistic. Thankfully, R includes some sample
data sets to play around with. One of these is volcano, a 3D map of a dormant
New Zealand volcano.
It's simply an 87x61 matrix with elevation values, but it shows the power of R's
matrix visualizations.

Try creating a contour map of the volcano matrix:

>contour(volcano)
Try a perspective plot (limit the vertical expansion to one-fifth again):

> persp(volcano, expand=0.2)

The image function will create a heat map:
Chapter 4 Summary Statistics

Determining the health of the crew is an important part of any inventory of the
ship. Here's a vector containing the number of limbs each member has left, along
with their names.

limbs <- c(4, 3, 4, 3, 2, 4, 4, 4)

names(limbs) <- c('One-Eye', 'Peg-Leg', 'Smitty', 'Hook',
'Scooter', 'Dan', 'Mikey', 'Blackbeard')
A quick way to assess our battle-readiness would be to get the average of the
crew's appendage counts. Statisticians call this the "mean". Call the mean
function with the limbs vector.
> mean(limbs)
[1] 3.5
> barplot(limbs)
> abline(h=mean(limbs)) // || to x axis
> abline(v=mean(limbs)) // || to y axis
> mean(limbs)
[1] 8
> median(limbs)
[1] 8

Standard deviation

> pounds <- c(45000, 50000, 35000, 40000, 35000, 45000, 10000,
15000)
> barplot(pounds)
> meanValue <- mean(pounds)
Let's see a plot showing the mean value:

> abline(h = meanValue)

1. Some of the plunder from our recent raids has been worth less than what we're used to.
Here's a vector with the values of our latest hauls:

2. > pounds <- c(45000, 50000, 35000, 40000, 35000, 45000, 10000, 15000)

3. > barplot(pounds)

4. > meanValue <- mean(pounds)

Let's see a plot showing the mean value:

> abline(h = meanValue)

These results seem way below normal. The crew wants to make Smitty, who picked the
last couple ships to waylay, walk the plank. But as he dangles over the water, wily Smitty
raises a question: what, exactly, is a "normal" haul?

o 01000020000300004000050000
5. Statisticians use the concept of "standard deviation" from the mean to describe the range
of typical values for a data set. For a group of numbers, it shows how much they typically
vary from the average value. To calculate the standard deviation, you calculate the mean
of the values, then subtract the mean from each number and square the result, then
average those squares, and take the square root of that average.
If that sounds like a lot of work, don't worry. You're using R, and all you have to do is
pass a vector to the sd function. Try calling sd on the pounds vector now, and assign the
result to the deviation variable:

RedoComplete

> deviation <- sd(pounds)

6. We'll add a line on the plot to show one standard deviation above the mean (the top of the
normal range)...

RedoComplete
> abline(h = meanValue + deviation)
Chapter 5 FACTORS

Creating factors
Often your data needs to be grouped by category: blood pressure by age range,
accidents by auto manufacturer, and so forth. R has a special collection type
called a factor to track these categorized values

It's time to take inventory of the ship's hold. We'll make a vector for you with the
type of booty in each chest.

To categorize the values, simply pass the vector to the factor function:

2. chests <- c('gold', 'silver', 'gems', 'gold', 'gems')

3. > types <- factor(chests)

4.
5. There are a couple differences between the original vector and the new factor that are
worth noting. Print the chests vector:
> chests

[1] "gold" "silver" "gems" "gold" "gems"

6. You see the raw list of strings, repeated values and all. Now print the types factor:
> types

[1] gold silver gems gold gems

Levels: gems gold silver

1. Printed at the bottom, you'll see the factor's "levels" - groups of unique values. Notice
also that there are no quotes around the values. That's because they're not strings; they're
actually integer references to one of the factor's levels.

2. Let's take a look at the underlying integers. Pass the factor to the as.integer function:
> as.integer(types)

[1] 2 3 1 2 1
3. You can get only the factor levels with the levels function:
> levels(types)
[1] "gems" "gold" "silver"

PLOTS WITH FACTORS

1. You can use a factor to separate plots into categories. Let's graph our five chests by
weight and value, and show their type as well. We'll create two vectors for
you; weights will contain the weight of each chest, and prices will track how much
the chests are worth.
Now, try calling plot to graph the chests by weight and value.
> weights <- c(300, 200, 100, 250, 150)

> prices <- c(9000, 5000, 12000, 7500, 18000)

> plot(weights, prices)

2. We can't tell which chest is which, though. Fortunately, we can use different plot
characters for each type by converting the factor to integers, and passing it to
the pch argument of plot.
> plot(weights, prices, pch=as.integer(types)) //pch is a defined variable of plot
"Circle", "Triangle", and "Plus Sign" still aren't great descriptions for treasure, though.
Let's add a legend to show what the symbols mean.

3. The legend function takes a location to draw in, a vector with label names, and a vector
with numeric plot character IDs.
> legend("topright", c("gems", "gold", "silver"), pch=1:3)

> legend("topleft", c("gems", "gold", "silver"), pch=1:3)

> legend("bottomleft", c("gems", "gold", "silver"), pch=1:3)

> legend("bottomright", c("gems", "gold", "silver"), pch=1:3)

Next time the boat's taking on water, it would be wise to dump the silver and keep the
gems!

4. If you hard-code the labels and plot characters, you'll have to update them every time you
change the plot factor. Instead, it's better to derive them by using the levels function on
your factor:
> legend("topright", levels(types), pch=1:length(levels(types)))
CHAPTER 6 DATA FRAMES

The weights, prices, and types data structures are all deeply tied together, if you think about it. If
you add a new weight sample, you need to remember to add a new price and type, or risk
everything falling out of sync. To avoid trouble, it would be nice if we could tie all these
variables together in a single data structure.
Fortunately, R has a structure for just this purpose: the data frame. You can think of a data frame
as something akin to a database table or an Excel spreadsheet. It has a specific number of
columns, each of which is expected to contain values of a particular type. It also has an
indeterminate number of rows - sets of related values for each column.

Our vectors with treasure chest data are perfect candidates for conversion to a data frame. And
it's easy to do. Call the data.frame function, and pass weights, prices, and types as the arguments.
Assign the result to the treasure variable:

1. treasure <- data.frame(weights, prices, types)

2.
3. Now, try printing treasure to see its contents:

> print(treasure)

weights prices types

1 300 9000 gold

2 200 5000 silver

3 100 12000 gems

4 250 7500 gold

5 150 18000 gems

DATA FRAME ACCESS

1. Just like matrices, it's easy to access individual portions of a data frame.

You can get individual columns by providing their index number in double-brackets. Try
getting the second column (prices) of treasure:

> treasure[[2]]
[1] 9000 5000 12000 7500 18000
2. You could instead provide a column name as a string in double-brackets. (This is often
more readable.) Retrieve the "weights" column:

> treasure[["weights"]]

[1] 300 200 100 250 150

3. Typing all those brackets can get tedious, so there's also a shorthand notation: the data
frame name, a dollar sign, and the column name (without quotes). Try using it to get
the "prices" column:
> treasure$prices
[1] 9000 5000 12000 7500 18000

> treasure[1,2]

[1] 9000

> treasure[1]

weights

1 300

2 200

3 100

4 250

5 150

> treasure[,1]

[1] 300 200 100 250 150

> treasure[1,]

weights prices type

1 300 9000 gold

LOAD DATA FRAME

1. Typing in all your data by hand only works up to a point, obviously, which is why R was
given the capability to easily load data in from external files.

We've created a couple data files for you to experiment with:

> list.files()

[1] "targets.csv" "infantry.txt"

Our "targets.csv" file is in the CSV (Comma Separated Values) format exported by many
popular spreadsheet programs. Here's what its content looks like:

"Port","Population","Worth"

"Cartagena",35000,10000

"Porto Bello",49000,15000

"Havana",140000,50000

"Panama City",105000,35000
You can load a CSV file's content into a data frame by passing the file name to
the read.csv function. Try it with the "targets.csv" file:

> read.csv("targets.csv")

Port Population Worth

1 Cartagena 35000 10000

2 Porto Bello 49000 15000

3 Havana 140000 50000

4 Panama City 105000 35000

2. The "infantry.txt" file has a similar format, but its fields are separated by tab
characters rather than commas. Its content looks like this:
3. Port Infantry

4. Porto Bello 700

5. Cartagena 500

6. Panama City 1500

7. Havana 2000
For files that use separator strings other than commas, you can use
the read.table function. The sep argument defines the separator character, and you
can specify a tab character with "\t".
Call read.table on "infantry.txt", using tab separators:
> read.table("infantry.txt", sep="\t")

V1 V2

1 Port Infantry

2 Porto Bello 700

3 Cartagena 500

4 Panama City 1500

5 Havana 2000
8. Notice the "V1" and "V2" column headers? The first line is not automatically treated as
column headers with read.table. This behavior is controlled by the header argument.
Call read.table again, setting header to TRUE:
> read.table("infantry.txt", sep="\t", header=TRUE)

Port Infantry

1 Porto Bello 700

2 Cartagena 500

3 Panama City 1500

4 Havana 2000

MERGING DATA FRAME

We want to loot the city with the most treasure and the fewest guards. Right now,
though, we have to look at both files and match up the rows. It would be nice if all
the data for a port were in one place...

R's merge function can accomplish precisely that. It joins two data frames
together, using the contents of one or more columns. First, we're going to store
those file contents in two data frames for you, targets and infantry.
The merge function takes arguments with an x frame (targets) and a y frame
(infantry). By default, it joins the frames on columns with the same name (the
two Port columns). See if you can merge the two frames:
RedoComplete

> targets <- read.csv("targets.csv")

> infantry <- read.table("infantry.txt", sep="\t", header=TRUE)

> merge(x = targets, y = infantry)

Port Population Worth Infantry

1 Cartagena 35000 10000 500

2 Havana 140000 50000 2000

3 Panama City 105000 35000 1500

4 Porto Bello 49000 15000 700

CHAPTER 7 REAL WORLD DATA

1. So far, we've been working purely in the abstract. It's time to take a look at some real
data, and see if we can make any observations about it.

2. Some Real World Data7.1

Modern pirates plunder software, not silver. We have a file with the software piracy rate,
sorted by country. Here's a sample of its format:

Country,Piracy

Australia,23

Bangladesh,90

Brunei,67

China,77

...
We'll load that into the piracy data frame for you:
> piracy <- read.csv("piracy.csv")
We also have another file with GDP per capita for each country (wealth produced,
divided by population):

Rank Country GDP

1 Liechtenstein 141100

2 Qatar 104300

3 Luxembourg 81100

4 Bermuda 69900
...
That will go into the gdp frame:
> gdp <- read.table("gdp.txt", sep=" ", header=TRUE)
We'll merge the frames on the country names:

> countries <- merge(x = gdp, y = piracy)

Let's do a plot of GDP versus piracy. Call the plot function, using the "GDP" column
of countries for the horizontal axis, and the "Piracy" column for the vertical axis:
RedoComplete

> plot(countries$GDP, countries$Piracy)

3. It looks like there's a negative correlation between wealth and piracy - generally, the
higher a nation's GDP, the lower the percentage of software installed that's pirated. But
do we have enough data to support this connection? Is there really a connection at all?

R can test for correlation between two vectors with the cor.test function. Try calling it on
the GDP and Piracy columns of the countries data frame:

RedoComplete

> cor.test(countries$GDP, countries$Piracy)

Pearson's product-moment correlation

data: countries$GDP and countries$Piracy

t = -14.8371, df = 107, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

-0.8736179 -0.7475690

sample estimates:

cor

-0.8203183

The key result we're interested in is the "p-value". Conventionally, any correlation with a
p-value less than 0.05 is considered statistically significant, and this sample data's p-value
is definitely below that threshold. In other words, yes, these data do show a statistically
significant negative correlation between GDP and software piracy.
4. We have more countries represented in our GDP data than we do our piracy rate data. If
we know a country's GDP, can we use that to estimate its piracy rate?

We can, if we calculate the linear model that best represents all our data points (with a
certain degree of error). The lm function takes a model formula, which is represented by
a response variable (piracy rate), a tilde character (~), and a predictor variable (GDP).
(Note that the response variable comes first.)
Try calculating the linear model for piracy rate by GDP, and assign it to
the line variable:
RedoComplete

> line <- lm(countries$Piracy ~ countries$GDP)

5. You can draw the line on the plot by passing it to the abline function. Try it now:

RedoComplete

> abline(line)
Now, if we know a country's GDP, we should be able to make a reasonable prediction of
how common piracy is there!

6. ggplot27.2
The functionality we've shown you so far is all included with R by default. (And it's
pretty powerful, isn't it?) But in case the default installation doesn't include that function
you need, there are still more libraries available on the servers of the Comprehensive R
Archive Network, or CRAN. They can add anything from new statistical functions to
better graphics capabilities. Better yet, installing any of them is just a command away.

Let's install the popular ggplot2 graphics package. Call

the install.packages function with the package name in a string:
RedoComplete

> install.packages("ggplot2")

--- Please select a CRAN mirror for use in this session ---

Loading Tcl/Tk interface ... done

trying URL 'http://rweb.quant.ku.edu/cran/src/contrib/ggplot2_0.9.2.1.tar.gz'

Content type 'application/x-gzip' length 2310996 bytes (2.2 Mb)

opened URL

==================================================

downloaded 2.2 Mb
* installing *source* package 'ggplot2' ...

** package 'ggplot2' successfully unpacked and MD5 sums checked

** R

** data

** moving datasets to lazyload DB

** inst

** preparing package for lazy loading

** help

*** installing help indices

** building package indices

** testing if installed package can be loaded

* DONE (ggplot2)
7. You can get help for a package by calling the help function and passing the package
name in the package argument. Try displaying help for the "ggplot2" package:

RedoComplete

> help(package = "ggplot2")

Information on package 'ggplot2'

Description:

Package: ggplot2

Type: Package

Title: An implementation of the Grammar of Graphics

Version: 0.9.1

...

8. Here's a quick demo of the power you've just added to R. To use it, let's revisit some data
from a previous chapter.
9. > weights <- c(300, 200, 100, 250, 150)

10. > prices <- c(9000, 5000, 12000, 7500, 18000)

11. > chests <- c('gold', 'silver', 'gems', 'gold', 'gems')

12. > types <- factor(chests)

The qplot function is a commonly-used part of ggplot2. We'll pass the weights and values
of our cargo to it, using the chest types vector for the color argument:

RedoComplete

> qplot(weights, prices, color = types)

Not bad! An attractive grid background and colorful legend, without any of the
configuration hassle from before!

ggplot2 is just the first of many powerful packages awaiting discovery on CRAN. And
of course, there's much, much more functionality in the standard R libraries. This course
has only scratched the surface!

Assignment 2: Introduction To R: Text Like This Will Be Problems For You To Do and Turn In. (There Are 7 in All.)
No ratings yet
Assignment 2: Introduction To R: Text Like This Will Be Problems For You To Do and Turn In. (There Are 7 in All.)
15 pages
Network Analysis and Visualization With R and Igraph
No ratings yet
Network Analysis and Visualization With R and Igraph
62 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
R-Unit 2
No ratings yet
R-Unit 2
81 pages
Practical 1- Basics of R
No ratings yet
Practical 1- Basics of R
8 pages
KD Lab - 1 Introductions To R
No ratings yet
KD Lab - 1 Introductions To R
12 pages
About R Language: Installation
No ratings yet
About R Language: Installation
7 pages
data anlytics using r notes
No ratings yet
data anlytics using r notes
14 pages
R Introduction
No ratings yet
R Introduction
40 pages
Rbasics
No ratings yet
Rbasics
96 pages
R - Lecture 2
No ratings yet
R - Lecture 2
51 pages
Source Code 1
No ratings yet
Source Code 1
40 pages
Introduction To Spatial Data Handling in R
No ratings yet
Introduction To Spatial Data Handling in R
25 pages
MDPN460 Lecture03
No ratings yet
MDPN460 Lecture03
34 pages
R Session A
No ratings yet
R Session A
107 pages
Lab 1 22.7
No ratings yet
Lab 1 22.7
40 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
R BasicCommands
No ratings yet
R BasicCommands
5 pages
Lecture 2: More Data Structures: Outline
No ratings yet
Lecture 2: More Data Structures: Outline
16 pages
R-Basic Concepts
No ratings yet
R-Basic Concepts
67 pages
Intr2R Week2 2020
No ratings yet
Intr2R Week2 2020
13 pages
R Programming LAB Manual
No ratings yet
R Programming LAB Manual
39 pages
R study material I
No ratings yet
R study material I
8 pages
ex3
No ratings yet
ex3
20 pages
R Assignment
No ratings yet
R Assignment
9 pages
Intro To Statistic Using R - Session 2
No ratings yet
Intro To Statistic Using R - Session 2
1 page
Data Structure in
No ratings yet
Data Structure in
18 pages
R Cheatsheet Base R
No ratings yet
R Cheatsheet Base R
2 pages
Introduction To R
No ratings yet
Introduction To R
74 pages
N2 Data in R
No ratings yet
N2 Data in R
7 pages
Introduction to r Chap 2
No ratings yet
Introduction to r Chap 2
30 pages
CIND123 Swirl Lesson 15
No ratings yet
CIND123 Swirl Lesson 15
46 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
R Programming Swirl
No ratings yet
R Programming Swirl
22 pages
RStudio
No ratings yet
RStudio
60 pages
Chapter 1 Introduction To R
No ratings yet
Chapter 1 Introduction To R
33 pages
R Programming PDF
No ratings yet
R Programming PDF
128 pages
R Programming PDF
No ratings yet
R Programming PDF
128 pages
This Is The Course Script
No ratings yet
This Is The Course Script
9 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
R
No ratings yet
R
20 pages
R Training by Emma Mba
No ratings yet
R Training by Emma Mba
68 pages
Data in R
No ratings yet
Data in R
7 pages
R
No ratings yet
R
13 pages
R Is A Command Line Based Language All Commands Are Entered Directly Into The Console. R
No ratings yet
R Is A Command Line Based Language All Commands Are Entered Directly Into The Console. R
8 pages
Ids Unit LLL Jntuh Cse
No ratings yet
Ids Unit LLL Jntuh Cse
100 pages
unit 3
No ratings yet
unit 3
45 pages
MLlab5th
No ratings yet
MLlab5th
17 pages
BDA Section 3
No ratings yet
BDA Section 3
33 pages
Introduction To R
No ratings yet
Introduction To R
45 pages
1 - Introduction To Programming With R
No ratings yet
1 - Introduction To Programming With R
13 pages
cours
No ratings yet
cours
33 pages
R Programming
No ratings yet
R Programming
22 pages
R-pres
No ratings yet
R-pres
53 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
4 Overview of R Part 2
No ratings yet
4 Overview of R Part 2
63 pages
First Course On R
No ratings yet
First Course On R
26 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
From Everand
Graphs with MATLAB (Taken from "MATLAB for Beginners: A Gentle Approach")
Peter Kattan
4/5 (2)
Object Oriented Programming
No ratings yet
Object Oriented Programming
137 pages
The Applications of Chemical Engineering Simulation Software
100% (1)
The Applications of Chemical Engineering Simulation Software
9 pages
The Ganges With Sue Perkins Series 1 - DocuWiki
No ratings yet
The Ganges With Sue Perkins Series 1 - DocuWiki
5 pages
Computing Essentials 2014: The Internet and Web
No ratings yet
Computing Essentials 2014: The Internet and Web
28 pages
YJ-H6001 Hematology Analyzer
No ratings yet
YJ-H6001 Hematology Analyzer
5 pages
Configuring The GF TQ6 For The PMDG NGX Using FSUIPC PDF
No ratings yet
Configuring The GF TQ6 For The PMDG NGX Using FSUIPC PDF
10 pages
Untitled
No ratings yet
Untitled
203 pages
VSP Data Interpretation and Processing PDF
No ratings yet
VSP Data Interpretation and Processing PDF
214 pages
Seminar (PLASMONICS)
No ratings yet
Seminar (PLASMONICS)
10 pages
Python Modules notes
No ratings yet
Python Modules notes
3 pages
Islp 2
No ratings yet
Islp 2
6 pages
Reglamento GrandArchive
No ratings yet
Reglamento GrandArchive
83 pages
18siemens MCCB
No ratings yet
18siemens MCCB
192 pages
Week 4 Assignment 1 Solution
No ratings yet
Week 4 Assignment 1 Solution
10 pages
Bhansar Smarika 2080 Book Final For WEB
No ratings yet
Bhansar Smarika 2080 Book Final For WEB
307 pages
Worksheet Networking Part1
0% (1)
Worksheet Networking Part1
9 pages
Dissertation Irina Oswald
No ratings yet
Dissertation Irina Oswald
154 pages
10 Key Points - Value of Commvault Data Platform
No ratings yet
10 Key Points - Value of Commvault Data Platform
2 pages
Agilent-E6000A Manual PDF
No ratings yet
Agilent-E6000A Manual PDF
320 pages
Avmg 2600 Final
No ratings yet
Avmg 2600 Final
10 pages
Đ Xuân Trư ng-IEIESB21003-Lab3
No ratings yet
Đ Xuân Trư ng-IEIESB21003-Lab3
7 pages
BOCW English
No ratings yet
BOCW English
1 page
Diana Naturals - Beverage - Savoury
No ratings yet
Diana Naturals - Beverage - Savoury
9 pages
duck.ai_2024-12-17_03-53-58
No ratings yet
duck.ai_2024-12-17_03-53-58
3 pages
Planar
No ratings yet
Planar
4 pages
CAIE-IGCSE-Computer Science - Practical
No ratings yet
CAIE-IGCSE-Computer Science - Practical
18 pages
1.5 Literature Review
No ratings yet
1.5 Literature Review
4 pages
Iso 26262 Safety Cases: Compliance and Assurance: Rob Palin, David Ward, Ibrahim Habli, Roger Rivett
No ratings yet
Iso 26262 Safety Cases: Compliance and Assurance: Rob Palin, David Ward, Ibrahim Habli, Roger Rivett
6 pages
RESUME Excy Salavarria
No ratings yet
RESUME Excy Salavarria
5 pages
I4C Daily Digest - 29.02.2024
No ratings yet
I4C Daily Digest - 29.02.2024
24 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.