0% found this document useful (0 votes)

43 views

R Course Own English HS

R and RStudio are tools for data analysis and visualization. R is the programming language and RStudio is an integrated development environment (IDE) that provides a user-friendly interface for working in R. The tidyverse is a collection of R packages that provide a consistent set of functions for data manipulation, visualization, and analysis using verbs like filter(), summarise(), and group_by(). Tables in R like dataframes and tibbles store data in columns that can be manipulated using these verbs and the pipe operator (%>%) to chain commands together.

Uploaded by

Pedro Henrique

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views

R Course Own English HS

Uploaded by

Pedro Henrique

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

Introduction to R / RStudio

BMS

Karin Groothuis-Oudshoorn & Robert Marinescu-Muster

August 27th

1
Program today

• Introduction R / RStudio
• tables in R (with dplyr)
• Make table objects
• Select rows/columns
• Sort rows
• Add new columns
• Aggregate data
• Make graphs with ggplot2
• Simple regression models

2
R for Data Science

3
R for Data Science boek

book online: http://r4ds.had.co.nz/

4
R en RStudio

R: the engine RStudio: Dashboard

5
RStudio

6
R and R packages

R: a new phone R packages: Apps that you can

download

7
Installing and using packages

1. Install package: only once

2. Lading package: every session
3. Reinstall package: if you update R
4. List with default packages:
## which packages are default loaded?
search()

## [1] ".GlobalEnv" "package:stats" "package:graphics"

## [4] "package:grDevices" "package:utils" "package:datasets"
## [7] "package:methods" "Autoloads" "package:base"

8
Installing packages

The easy way: in the lower right

panel of RStudio:
a) Click on the ’Packages’ tab
b) Click on ’Install’
c) Type the name of the
package under ’Packages’
d) Click on ’Install’

9
Tidyverse

• Developer: Hadley Wickham (van RStudio)

• Collection of packages: dplyr, ggplot2, tibble, readr,

tidyr, purrr, stringr, forcats

• More consistent than standard R

• A good starting point to learn R

• Is not standard R!

• Webpage: http://www.tidyverse.org
install.packages(tidyverse)
library(tidyverse)
10
Base R
Variabels / Vectors: int, num

a <- 1L
class(a)

## [1] "integer"
vec1 <- c(1L,3L,5L,7L)
class(vec1)

## [1] "integer"
vec2 <- c(10.5,3.2,pi,4)
class(vec2)

## [1] "numeric"
vec1 + vec2

## [1] 11.500000 6.200000 8.141593 11.000000

11
Variabels / Vectors: chr, lgl, date

• vec3: character vector ("character")

• vec4: logical vector ("logical")
• vec5: date vector ("date")
vec3 <- c("low","low","high","medium")
vec4 <- c(TRUE,FALSE,TRUE,FALSE)
library(lubridate)
vec5 <- ymd(c("2000-9-14","2002-7-3",
"2004-4-14","2004-6-10"))

12
Tables: dataframe or tibble

• Columns of a dataframe / tibble are vectors

• Each column/vector has equal length and may have different
types of data
library(tibble)
table <- tibble(
col1 = vec1,
col2 = vec2,
col3 = vec3,
col4 = vec4,
col5 = vec5
)
table

## # A tibble: 4 x 5
## col1 col2 col3 col4 col5
## <int> <dbl> <chr> <lgl> <date>
## 1 1 10.5 low TRUE 2000-09-14
## 2 3 3.2 low FALSE 2002-07-03
## 3 5 3.14 high TRUE 2004-04-14
13
## 4 7 4 medium FALSE 2004-06-10
Work with columns and vectors

• Use $ to work with columns

• Select elements or parts of a vector / table with the brackets: [
and ]
table$col1 # select the column vec1 from table
table[,1] # select the first column from the table
table[1,] # select the first row from the table
table[1:2,3:4] # select the first 2 row, 3 and 4e column
table[table$col3 == "low",] # select all rows with vec3 == "low"

14
Functions

• a function has a name and arguments

• the arguments are between parenthesis ().
• use no space between function name and the parenthesis!
• use the help() function for more info of a function
sum(c(3,5,7))

## [1] 15
mean(c(10,14,2000))

## [1] 674.6667
mean(c(10,14,NA), na.rm = T)

## [1] 12

15
Objects

• object contains the result of an assignment

• object can be used in a new assignment
• object you can make with <- (or with =)
• object will be save in global environment (so upper left panel)
• object global environment will be deleted if you quit
R/Rstudio (except if you save it)
a_row <- seq(from = 100, to = 200, by = 10)
a_row2 <- 2*a_row
a_row2

## [1] 200 220 240 260 280 300 320 340 360 380 400

16
Naming of objects

• the name of an object starts with a letter

• the name of an object contains only letters, numbers, _ and .
• small letters and capital letters are NOT the same
• don’s use spaces in names

17
R script

• Document with lines of assignments / code

• Extension of the file is .R
• Comments can be written in it starting with a hashtag (#)
• Run the code with the Run-button (in the middle/ above in
RStudio)
• Output will be shown in the Console
• Advances reproducibility

18
RStudio Project

• A directory on the hard disk

• Put scripts and data in a project directory
• A project directory is a working directory for R
• RStudio places some standard files in a project directory
• You can make a project directory with File > New Project
• You can open an existing project directory with File > Open
Project
• Data on a project will be saved in the file
<jouw_projectnaam>.Rproj.

19
Exercise session 1
Tables in R
Tables

• A table object has a name

• A talbe has rows and columns
• Data in a column should have the same type
• Columns are all of equal size
• Columns have a name
• Rows sometimes have a name (possible for dataframes, not
for tibbles)

20
Example: Gapminder table

library(gapminder)
gapminder

## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # ... with 1,694 more rows

21
Base R versus tidyverse

# Base R
asia <- gapminder[gapminder$continent == "Asia",]
mean(asia$lifeExp)

## [1] 60.0649
library(dplyr) # Tidyverse
gapminder %>%
filter(continent == "Asia") %>%
summarise(mean_exp = mean(lifeExp))

## # A tibble: 1 x 1
## mean_exp
## <dbl>
## 1 60.1
22
The pipe operator

• A way of chaining commands next to each other

• You can read it as and then
• Part of package magittr (but is automatically loaded with
dplyr)
gapminder %>%
filter(continent == "Asia") %>%
summarise(mean_exp = mean(lifeExp))
# without pipe
temp <- filter(gapminder, continent == "Asia")
temp <- summarise(temp, mean_exp = mean(lifeExp))
23
Seven verbs for data wrangling (package dplyr)

• filter
• summarise
• group_by (and ungroup)
• mutate
• arrange
• rename
• select

24
filter()

• select rows from a table

• arguments are filters that you want to apply
• use == to compare values
gap_2007 <- gapminder %>%
filter(year == 2007)
head(gap_2007, n = 4)

## # A tibble: 4 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 2007 43.8 31889923 975.
## 2 Albania Europe 2007 76.4 3600523 5937.
## 3 Algeria Africa 2007 72.3 33333216 6223.
## 4 Angola Africa 2007 42.7 12420476 4797.

25
filter(): more filters simultaneously (OR operator)

• use | to use more filters simultaneously

• checks if at least one filter is satisfied
gapminder %>%
filter(year == 2007 | continent == "Asia") %>%
sample_n(8)

## # A tibble: 8 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Djibouti Africa 2007 54.8 496374 2082.
## 2 Bahrain Asia 1977 65.6 297410 19340.
## 3 Thailand Asia 1997 67.5 60216677 5853.
## 4 Namibia Africa 2007 52.9 2055080 4811.
## 5 Singapore Asia 1982 71.8 2651869 15169.
## 6 Korea, Dem. Rep. Asia 2007 67.3 23301725 1593.
## 7 Nepal Asia 1997 59.4 23001113 1011.
## 8 Reunion Africa 2007 76.4 798094 7670.

26
filter(): more filters simultaneously (AND operator)

• use, of & or to use more filters

• checks whether all filters are satisfied
gapminder %>%
filter(year == 2002, continent == "Asia")

## # A tibble: 33 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 2002 42.1 25268405 727.
## 2 Bahrain Asia 2002 74.8 656397 23404.
## 3 Bangladesh Asia 2002 62.0 135656790 1136.
## 4 Cambodia Asia 2002 56.8 12926707 896.
## 5 China Asia 2002 72.0 1280400000 3119.
## 6 Hong Kong, China Asia 2002 81.5 6762476 30209.
## 7 India Asia 2002 62.9 1034172547 1747.
## 8 Indonesia Asia 2002 68.6 211060000 2874.
## 9 Iran Asia 2002 69.5 66907826 9241.
## 10 Iraq Asia 2002 57.0 24001816 4391.
## # ... with 23 more rows
27
filter(): use of %in%

• repeatedly use of | with ==

gapminder %>%
filter(year %in% c(1987,1992,1997),
country %in% c("Netherlands","Belgium"))

## # A tibble: 6 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Belgium Europe 1987 75.4 9870200 22526.
## 2 Belgium Europe 1992 76.5 10045622 25576.
## 3 Belgium Europe 1997 77.5 10199787 27561.
## 4 Netherlands Europe 1987 76.8 14665278 23651.
## 5 Netherlands Europe 1992 77.4 15174244 26791.
## 6 Netherlands Europe 1997 78.0 15604464 30246.

28
summarise()

• function to calculate aggregate statistics

stats_2007 <- gapminder %>%
filter(year == 2007) %>%
summarise(max_exp = max(lifeExp),
mean_exp = mean(lifeExp),
sd_exp = sd(lifeExp))
stats_2007

## # A tibble: 1 x 3
## max_exp mean_exp sd_exp
## <dbl> <dbl> <dbl>
## 1 82.6 67.0 12.1

29
combination of group_by() and summarise()

• calculate a numerical summary per group using a

categorical variable
stats_2007 <- gapminder %>%
filter(year == 2007) %>%
group_by(continent) %>%
summarise(max_exp = max(lifeExp),
mean_exp = mean(lifeExp),
sd_exp = sd(lifeExp),
aantal = n())
stats_2007

## # A tibble: 5 x 5
## continent max_exp mean_exp sd_exp aantal
## <fct> <dbl> <dbl> <dbl> <int>
## 1 Africa 76.4 54.8 9.63 52
## 2 Americas 80.7 73.6 4.44 25
## 3 Asia 82.6 70.7 7.96 33
## 4 Europe 81.8 77.6 2.98 30
## 5 Oceania 81.2 80.7 0.729 2
30
mutate()

• create a new column with a fixed value

gap_plus <- gapminder %>%
mutate(just_one = 1)
head(gap_plus, n=4)

## # A tibble: 4 x 7
## country continent year lifeExp pop gdpPercap just_one
## <fct> <fct> <int> <dbl> <int> <dbl> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779. 1
## 2 Afghanistan Asia 1957 30.3 9240934 821. 1
## 3 Afghanistan Asia 1962 32.0 10267083 853. 1
## 4 Afghanistan Asia 1967 34.0 11537966 836. 1

31
mutate()

• create a new column with other variables/columns

gap_gdp <- gapminder %>%
mutate(gdp = pop * gdpPercap)
head(gap_gdp, n=4)

## # A tibble: 4 x 7
## country continent year lifeExp pop gdpPercap gdp
## <fct> <fct> <int> <dbl> <int> <dbl> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779. 6567086330.
## 2 Afghanistan Asia 1957 30.3 9240934 821. 7585448670.
## 3 Afghanistan Asia 1962 32.0 10267083 853. 8758855797.
## 4 Afghanistan Asia 1967 34.0 11537966 836. 9648014150.

32
mutate()

• adapt existing columns

• also if you have made a column in the same mutate statement
gap_gdp <- gap_gdp %>%
mutate(gdp = gdp/1000000)
head(gap_gdp, n=4)

## # A tibble: 4 x 7
## country continent year lifeExp pop gdpPercap gdp
## <fct> <fct> <int> <dbl> <int> <dbl> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779. 6567.
## 2 Afghanistan Asia 1957 30.3 9240934 821. 7585.
## 3 Afghanistan Asia 1962 32.0 10267083 853. 8759.
## 4 Afghanistan Asia 1967 34.0 11537966 836. 9648.

33
combination of mutate() and group_by

• mutate colums by subcategory

gapminder %>%
filter(year == 2007) %>%
group_by(continent) %>%
mutate(rank = rank(desc(lifeExp))) %>% arrange(rank)

## # A tibble: 142 x 7
## # Groups: continent [5]
## country continent year lifeExp pop gdpPercap rank
## <fct> <fct> <int> <dbl> <int> <dbl> <dbl>
## 1 Australia Oceania 2007 81.2 20434176 34435. 1
## 2 Canada Americas 2007 80.7 33390141 36319. 1
## 3 Iceland Europe 2007 81.8 301931 36181. 1
## 4 Japan Asia 2007 82.6 127467972 31656. 1
## 5 Reunion Africa 2007 76.4 798094 7670. 1
## 6 Costa Rica Americas 2007 78.8 4133884 9645. 2
## 7 Hong Kong, China Asia 2007 82.2 6980412 39725. 2
## 8 Libya Africa 2007 74.0 6036914 12057. 2
## 9 New Zealand Oceania 2007 80.2 4115771 25185. 2
## 10 Switzerland Europe 2007 81.7 7554661 37506. 2 34
## # ... with 132 more rows
arrange()

• Order rows on the basis of other columns

• Default is ascending or alphabetical order
• NAs will be placed in the end
gapminder %>% arrange(year,country) %>% head(5)

## # A tibble: 5 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Albania Europe 1952 55.2 1282697 1601.
## 3 Algeria Africa 1952 43.1 9279525 2449.
## 4 Angola Africa 1952 30.0 4232095 3521.
## 5 Argentina Americas 1952 62.5 17876956 5911.

35
arrange() : descending order

• use the function desc()

gapminder %>% filter(year == 2007) %>%
arrange(desc(lifeExp)) %>% head(n = 5)

## # A tibble: 5 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Japan Asia 2007 82.6 127467972 31656.
## 2 Hong Kong, China Asia 2007 82.2 6980412 39725.
## 3 Iceland Europe 2007 81.8 301931 36181.
## 4 Switzerland Europe 2007 81.7 7554661 37506.
## 5 Australia Oceania 2007 81.2 20434176 34435.

36
group_by and ungroup()

• Split the categories with group_by() and unsplit with

ungroup()
temp <- gapminder %>%
group_by(continent, country)
head(temp, n = 1)

## # A tibble: 1 x 6
## # Groups: continent, country [1]
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
temp <- temp %>% ungroup()
head(temp, n = 1)

## # A tibble: 1 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.

37
select(): select columns

• select columns from a table

## select continent and country
gapminder %>% select(continent, country)

## select all column except gdpPercap

gapminder %>% select(-gpdPercap)

## select all columns continent untill pop

gapminder %>% select(continent:pop)

## select all columns except for continent untill pop

gapminder %>% select(-(continent:pop))

38
rename(): change column name

• change name of column

• left from = new name
• right from = old name
gapminder_NL <- gapminder %>%
rename(land = country,
jaar = year,
levensverwachting = lifeExp)

head(gapminder_NL, n = 2)

## # A tibble: 2 x 6
## land continent jaar levensverwachting pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.

39
Some usefull functions for tables

nrow(gapminder) # number rows table

ncol(gapminder) # number columns table
names(gapminder) # columns names table
str(gapminder) # structure table
head(gapminder) # head of table (first rows)
tail(gapminder) # tail of table (last rows)

40
Difference with and without pipe

## select continent en country

temp <- gapminder %>% select(continent, country) ## with pipe
temp <- select(gapminder, continent, country) ## without pipe

## Add column with gpd

gap_gdp <- gapminder %>%
mutate(gdp = pop * gdpPercap) ## with pipe
gap_gdp <- mutate(gapminder, gdp = pop * gdpPercap) ## without pipe

## calculate maximale life expectacy

stats_2007 <- gapminder %>%
filter(year == 2007) %>%
summarise(max_exp = max(lifeExp)) ## with pipe

temp <- filter(gapminder, year == 2007) ## without pipe

stats_2007 <- summarise(temp, max_exp = max(lifeExp))

41
Import datasets

• package readr
• via menu (in Environment button )
library(readr)
lbw <- read_csv("data/lbw_data.csv")

42
Make categorical variables

• package forcats
library(forcats)
lbw <- lbw %>%
mutate(low = factor(low),
low = fct_recode(low,"No" = "0", "Yes" = "1"),
race = factor(race),
race = fct_recode(race,"White" = "1","Black" = "2",
"Other" = "3"),
smoke = factor(smoke),
smoke = fct_recode(smoke,"No" = "0", "Yes" = "1"),
ht = factor(ht),
ht = fct_recode(ht,"No" = "0", "Yes" = "1"),
ui = factor(ui),
ui = fct_recode(ui,"No" = "0", "Yes" = "1"))

43
Exercise session 2
Make graphs with ggplot2
Package ggplot2

• Not standard R
• Based on Grammar of Graphics
• Graph = Data + Layout + Coordinate system
• Graph can have more layers
• A layer has aesthetic (aes) properties coupled with properties of
data
• Handy cheatsheet: https://www.rstudio.com/wp-
content/uploads/2015/03/ggplot2-cheatsheet.pdf

44
Scatterplot example code

library(ggplot2)

ggplot(lbw, aes(x = age, y = bwt)) +

geom_point() +
geom_smooth(method = "lm", se = FALSE) +
coord_cartesian(xlim = c(15,46), ylim = c(0, 5500)) +
labs(title = "Birthweigth and age mother",
x = "Age mother",
y = "Birth weight")

45
Scatterplot example
Birthweigth and age mother

●
● ● ●
●
● ●
4000 ●
● ●
●
● ●
●
● ● ●
● ● ● ●
● ● ● ● ●
● ● ●
● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ●
●
Birth weight

● ● ● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ● ●
● ● ● ● ●
●
● ● ● ●
● ●
●
● ● ●
●
● ● ● ● ● ●
●
● ● ● ●
● ● ● ● ●
● ● ●
● ● ●
●
● ●
● ●
● ●
● ● ●
● ● ● ●
2000 ●
●
● ● ●
● ● ●
● ●
●
●
● ●
●
●

●
●

0 46
Scatterplot example code: split according to smoking

• aes(): col, shape, size

library(ggplot2)

ggplot(lbw, aes(x = age, y = bwt)) +

geom_point(aes(col = smoke)) +
geom_smooth(method = "lm", se = FALSE) +
labs( x = "Age mother",
y = "Birth weight",
color = "Smoking")

47
Scatterplot split according to smoking

5000 ●

●
● ● ●
●
● ●
4000 ●
● ● ● ●
●
●
●
● ● ● ●
●
●
● ●
● ● ● ●
●
●
● ●
● ● ● ●
● ● ●
● ● ●
● ●
● ● ●
● ●
● ●
●
●
● ● ● ●
●
● ● ●
● ● ● ●
● ● ●
● ●
● ●
Birth weight

● ● ● ● ●
3000
● ●
●
●
● ●
● ●
●
● Smoking
● ● ●
●
● ● ●
● ●
●
●
● ● ● ● ● No
● ● ●
● ● ●
●
● ●
● ● ● Yes
● ●
●
● ●
●
● ● ● ●
● ● ●
● ●
●
● ●
● ● ●
● ● ● ●
● ●
● ●
●
● ●
● ● ●
● ● ●
● ● ●
● ●
● ●
2000 ●
● ● ● ●
● ● ●
●
●
●
●

● ●

●
1000

●
48
Histogram

library(ggplot2)

ggplot(data = lbw) +
geom_histogram(aes(x = age, y = ..count.., fill = race)) +
labs(x = "Age mother",
y = "Number",
fill = "Race")

49
Histogram

Race
Number

White
Black
Other

0 50
Boxplot

library(ggplot2)

ggplot(data = lbw) +
geom_boxplot(aes(x = race, y = age, fill = smoke)) +
labs(x = "Race mother",
y = "Age mother",
fill = "Smoking")

51
Boxplot

●
Age mother

Smoking
30
No
Yes

52
Subplots

library(ggplot2)

ggplot(data = lbw) +
geom_histogram(aes(x = age, y = ..count..)) +
labs(x = "Age mother",
y = "Number") +
facet_wrap( ~ race, nrow = 1)

53
Histogram split with smoking
White Black Other

6
Number

20 30 40 20 30 40 20 30 40
Age mother

54
Exercise session 3
Simple models
Example: relation between FEV and age
6
●

● ●
5
● ●
● ●
● ●
●
● ●
●
●
● ●
●
● ●
● ●
●
● ●
● ●
● ●
● ● ●
●
● ● ●
● ●
● ● ● ●
●
4 ●
● ● ● ● ●
● ●
●
● ● ● ●
● ● ● ●
● ●
● ● ● ● ●
● ● ● ● ●
●
● ●
●
● ● ●
● ●
● ●
● ●
● ● ●
● ●
● ●
● ● ● ●
● ●
● ●
● ●
fev

● ●
● ● ● ●
● ● ● ●
● ● ●
● ●
● ●
● ● ●
● ●
● ●
●
● ● ● ●
● ● ● ●
● ●
● ●
●
● ●
● ●
● ●
● ● ● ●
●
●
● ● ● ●
3 ●
● ●
●
●
● ●
●
●
●
●
●
● ●
●
● ●
● ● ● ● ● ● ●
●
● ●
● ●
● ● ●
●
● ●
● ● ●
● ● ● ● ● ●
● ●
●
● ● ●
● ● ●
● ● ● ●
●
● ●
● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ●
● ●
● ● ●
●
● ● ●
● ●
● ●
● ● ● ● ● ● ●
●
● ● ●
● ● ●
●
● ●
● ● ● ●
●
● ● ● ● ● ●
● ●
● ● ●
● ●
●
● ●
● ●
● ●
● ●
●
●
● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ●
● ● ●
● ●
● ●
● ● ● ●
●
● ● ●
● ●
● ● ● ● ● ● ●
● ●
● ●
● ● ●
● ●
●
● ●
2 ● ● ●
●
●
●
● ●
●
● ●
● ● ●
● ● ●
● ● ●
● ●
● ●
● ● ● ●
● ●
● ●
● ●
● ● ●
● ●
● ● ●
● ●
● ●
● ●
● ● ●
● ● ●
● ●
● ●
● ●
●
● ● ●
●
● ● ● ●
●
● ● ● ● ●
●
● ● ●
● ● ●
●
● ● ● ●
● ●
● ● ●
●
● ● ●
●
● ●
● ●
● ●
●
● ● ●
1 ●

●
● ●

5 10 15
age

55
Linear regression

• function lm()
• formula model: response ~ x1 + x2 + etcetera
• intercept will be estimated automatically
• rows with missings (NA’s) will be discarded.
library(modelr)

fevDat <- read_csv(file = "data/fev.csv")

model <- lm(fev ~ age, data = fevDat)

summary(model) # Summary model

coefficients(model) # Beta's
nobs(model) # number of observations without NA's

56
Output lm

##
## Call:
## lm(formula = fev ~ age, data = fevDat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.57539 -0.34567 -0.04989 0.32124 2.12786
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.431648 0.077895 5.541 4.36e-08 ***
## age 0.222041 0.007518 29.533 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5675 on 652 degrees of freedom
## Multiple R-squared: 0.5722, Adjusted R-squared: 0.5716
## F-statistic: 872.2 on 1 and 652 DF, p-value: < 2.2e-16

57
Predictions

fevDat <- add_predictions(fevDat, model)

temp2 <- tibble(age = c(25,35,45,55,65))
temp2 <- add_predictions(temp2, model)
temp2

## # A tibble: 5 x 2
## age pred
## <dbl> <dbl>
## 1 25 5.98
## 2 35 8.20
## 3 45 10.4
## 4 55 12.6
## 5 65 14.9

58
Residuals

• add_residuals adds column resid with residual i

fevDat <- add_residuals(fevDat, model)

## histogram residuals
ggplot(fevDat) + geom_histogram(aes(x = resid))
## residualplot
ggplot(fevDat) + geom_point(aes(x = pred, y = resid))

59
Logistic regression

model <- glm(low ~ race + smoke + ht + ui + age + lwt,

data = lbw, family = "binomial")

summary(model)$coefficients # Beta's

## Estimate Std. Error z value Pr(>|z|)

## (Intercept) 0.43724022 1.191931228 0.3668334 0.713743272
## raceBlack 1.28064059 0.526694968 2.4314654 0.015037885
## raceOther 0.90188006 0.434362303 2.0763313 0.037863316
## smokeYes 1.02757057 0.393930669 2.6085061 0.009093838
## htYes 1.85761692 0.688848290 2.6966996 0.007003041
## uiYes 0.89538678 0.448493792 1.9964307 0.045887062
## age -0.01825600 0.035354134 -0.5163752 0.605592417
## lwt -0.01628503 0.006858566 -2.3744076 0.017577136

60
Exercise session 4
Information on the web

• www.stackoverflow.com
• www.r-project.org
• www.rweekly.org
• www.r-bloggers.com
• cheatsheets (zie ook Help > Cheatsheets)

61
End

R Cheat Sheet PDF
100% (1)
R Cheat Sheet PDF
38 pages
R Programming Cheatsheet
100% (1)
R Programming Cheatsheet
6 pages
RAMS Comparacion Normas
100% (1)
RAMS Comparacion Normas
7 pages
Alignment Check Methodology in Piping Stress Analysis Using Caesar II
No ratings yet
Alignment Check Methodology in Piping Stress Analysis Using Caesar II
2 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
R study material I
No ratings yet
R study material I
8 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
R Workshop Material 18-19, Oct-2023
No ratings yet
R Workshop Material 18-19, Oct-2023
67 pages
BT1101 - R Code Cheatsheet 1.0
No ratings yet
BT1101 - R Code Cheatsheet 1.0
12 pages
UL2
No ratings yet
UL2
2 pages
All Codes
No ratings yet
All Codes
10 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
R Programming Cheat Sheet: Ata Tructures
No ratings yet
R Programming Cheat Sheet: Ata Tructures
2 pages
R Prog
No ratings yet
R Prog
27 pages
Introduction To R
No ratings yet
Introduction To R
39 pages
Chap 1
No ratings yet
Chap 1
32 pages
R Cheatsheet Base R
No ratings yet
R Cheatsheet Base R
2 pages
Tutorial 1
No ratings yet
Tutorial 1
29 pages
R Course ISLR Basics 2023
No ratings yet
R Course ISLR Basics 2023
77 pages
6 Working With Data Frames in R
No ratings yet
6 Working With Data Frames in R
8 pages
R Software - Notes
No ratings yet
R Software - Notes
18 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
Beginner Guide To R and R Studio V1
No ratings yet
Beginner Guide To R and R Studio V1
27 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
Introduction To R, Version 2
No ratings yet
Introduction To R, Version 2
51 pages
R Basic
No ratings yet
R Basic
16 pages
2 Undefined
No ratings yet
2 Undefined
86 pages
R Commands
No ratings yet
R Commands
18 pages
R Language PDF
100% (1)
R Language PDF
619 pages
楊睿中統計學合併版
No ratings yet
楊睿中統計學合併版
557 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
Network Analysis and Visualization With R and Igraph
No ratings yet
Network Analysis and Visualization With R and Igraph
62 pages
data anlytics using r notes
No ratings yet
data anlytics using r notes
14 pages
R-pres
No ratings yet
R-pres
53 pages
MBA Sem 1 Unit 3 Fundamentals of R (1)
No ratings yet
MBA Sem 1 Unit 3 Fundamentals of R (1)
41 pages
R Introduction II
No ratings yet
R Introduction II
45 pages
R Intro STAT5000
No ratings yet
R Intro STAT5000
17 pages
STATA - Subject Table of Contents
No ratings yet
STATA - Subject Table of Contents
15 pages
R
No ratings yet
R
13 pages
Data Tidying With Tidyr::: Cheat Sheet
No ratings yet
Data Tidying With Tidyr::: Cheat Sheet
2 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
Getting Started With R
No ratings yet
Getting Started With R
155 pages
DS Lab
No ratings yet
DS Lab
31 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
R Program Cheat Sheet 1
No ratings yet
R Program Cheat Sheet 1
2 pages
R/Rpad Reference Card: Slicing and Extracting Data
No ratings yet
R/Rpad Reference Card: Slicing and Extracting Data
5 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
R Reference Card
100% (4)
R Reference Card
4 pages
R Programming
No ratings yet
R Programming
61 pages
R Assignment
No ratings yet
R Assignment
9 pages
R Cheat Sheet 3 PDF
No ratings yet
R Cheat Sheet 3 PDF
2 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
From Everand
R Fast Track Guide - 86 Key Points Every Programmer from Other Languages Should Master
Ginno
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Advanced Aspects of Capital Budgeting
No ratings yet
Advanced Aspects of Capital Budgeting
16 pages
Do You Really Get Classloaders
No ratings yet
Do You Really Get Classloaders
30 pages
"A Very Different Kind of Learning Laboratory - . .": MIT Sloan Fellows Program in Innovation and Global Leadership
No ratings yet
"A Very Different Kind of Learning Laboratory - . .": MIT Sloan Fellows Program in Innovation and Global Leadership
23 pages
SCPL Functional Location Talcher Washery
No ratings yet
SCPL Functional Location Talcher Washery
15 pages
SACLAP LM Logbook Template-February 2019
No ratings yet
SACLAP LM Logbook Template-February 2019
10 pages
Placement Report July 2022
No ratings yet
Placement Report July 2022
38 pages
4TH-QUARTER PeTa INFOGRAPHIC-POSTER
No ratings yet
4TH-QUARTER PeTa INFOGRAPHIC-POSTER
4 pages
Nike
No ratings yet
Nike
2 pages
Innovative Flexural Strengthening of Reinforced Concrete Columns Using Carbon-Fiber Anchors
No ratings yet
Innovative Flexural Strengthening of Reinforced Concrete Columns Using Carbon-Fiber Anchors
9 pages
Screening Interview NOTES
No ratings yet
Screening Interview NOTES
3 pages
Media Studies Homework Help
100% (1)
Media Studies Homework Help
8 pages
Electrohold Rumen Radev
No ratings yet
Electrohold Rumen Radev
2 pages
Paying Guest Accomodation Sysyem Project Report_s
No ratings yet
Paying Guest Accomodation Sysyem Project Report_s
44 pages
HTML - Padding Inside Table Cell Despite TD (Padding - 0) - Stack Overflow
No ratings yet
HTML - Padding Inside Table Cell Despite TD (Padding - 0) - Stack Overflow
5 pages
Schneider Electric Lexium-23-Plus BCH0602O12A1C
No ratings yet
Schneider Electric Lexium-23-Plus BCH0602O12A1C
4 pages
Use of CCTV Footage As Evidence
No ratings yet
Use of CCTV Footage As Evidence
12 pages
IEEE Nogaye Correction Cameraready
No ratings yet
IEEE Nogaye Correction Cameraready
8 pages
Chapter 3-Plotting With PyPlot
No ratings yet
Chapter 3-Plotting With PyPlot
76 pages
Emissivity Table E4
No ratings yet
Emissivity Table E4
11 pages
Soil Nailing Thesis
100% (3)
Soil Nailing Thesis
8 pages
A Review of Metal-To-plastic Joinery in Automotive
No ratings yet
A Review of Metal-To-plastic Joinery in Automotive
15 pages
Analysis of Poverty-Environmental Degradation Nexus Among Arable Crop Farmers in Plateau State, Nigeria.
No ratings yet
Analysis of Poverty-Environmental Degradation Nexus Among Arable Crop Farmers in Plateau State, Nigeria.
8 pages
2024 January Public Arrest Log
No ratings yet
2024 January Public Arrest Log
3 pages
CX TDS Cromax Pro Basecoat Eng
No ratings yet
CX TDS Cromax Pro Basecoat Eng
8 pages
SOP On Deviation and Out-Of-Specification Investigation
0% (1)
SOP On Deviation and Out-Of-Specification Investigation
6 pages
Bathroom Layouts
100% (4)
Bathroom Layouts
30 pages
XXII. Partition - Casilang, Sr. Vs Casilang, Jr.
No ratings yet
XXII. Partition - Casilang, Sr. Vs Casilang, Jr.
2 pages
A Review On The Fundamental Engineering Properties of Compacted Laterite Soil at Different Gradations
No ratings yet
A Review On The Fundamental Engineering Properties of Compacted Laterite Soil at Different Gradations
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

R Course Own English HS

Uploaded by

R Course Own English HS

Uploaded by

Introduction to R / RStudio

Karin Groothuis-Oudshoorn & Robert Marinescu-Muster

book online: http://r4ds.had.co.nz/

R: the engine RStudio: Dashboard

R: a new phone R packages: Apps that you can

1. Install package: only once

## [1] ".GlobalEnv" "package:stats" "package:graphics"

The easy way: in the lower right

• Developer: Hadley Wickham (van RStudio)

• Collection of packages: dplyr, ggplot2, tibble, readr,

• More consistent than standard R

• A good starting point to learn R

## [1] 11.500000 6.200000 8.141593 11.000000

• vec3: character vector ("character")

• Columns of a dataframe / tibble are vectors

• Use $ to work with columns

• a function has a name and arguments

• object contains the result of an assignment

• the name of an object starts with a letter

• Document with lines of assignments / code

• A directory on the hard disk

• A table object has a name

• A way of chaining commands next to each other

• select rows from a table

• use | to use more filters simultaneously

• use, of & or to use more filters

• repeatedly use of | with ==

• function to calculate aggregate statistics

• calculate a numerical summary per group using a

• create a new column with a fixed value

• create a new column with other variables/columns

• adapt existing columns

• mutate colums by subcategory

• Order rows on the basis of other columns

• use the function desc()

• Split the categories with group_by() and unsplit with

• select columns from a table

## select all column except gdpPercap

## select all columns continent untill pop

## select all columns except for continent untill pop

• change name of column

nrow(gapminder) # number rows table

## select continent en country

## Add column with gpd

## calculate maximale life expectacy

temp <- filter(gapminder, year == 2007) ## without pipe

ggplot(lbw, aes(x = age, y = bwt)) +

• aes(): col, shape, size

ggplot(lbw, aes(x = age, y = bwt)) +

fevDat <- read_csv(file = "data/fev.csv")

summary(model) # Summary model

fevDat <- add_predictions(fevDat, model)

• add_residuals adds column resid with residual i

model <- glm(low ~ race + smoke + ht + ui + age + lwt,

## Estimate Std. Error z value Pr(>|z|)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

• add_residuals adds column resid with residual i