0% found this document useful (0 votes)
19 views30 pages

An Introduction to Data Analysis Visualization Using R

The document outlines a seminar on data analysis and visualization using R, focusing on R packages such as ggplot2 and agricolae. It includes a brief introduction to R and R Studio, programming basics, and practical examples of data manipulation and visualization techniques. Participants will learn to create various types of plots and manage datasets using built-in functions and packages.

Uploaded by

gab.mercado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views30 pages

An Introduction to Data Analysis Visualization Using R

The document outlines a seminar on data analysis and visualization using R, focusing on R packages such as ggplot2 and agricolae. It includes a brief introduction to R and R Studio, programming basics, and practical examples of data manipulation and visualization techniques. Participants will learn to create various types of plots and manage datasets using built-in functions and packages.

Uploaded by

gab.mercado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

04/22/2025

HORT 299/399 Special Seminar

An Introduction to
Data Analysis &
Visualization Using R

• a brief introduction to R and R Studio


• focus on R packages:
• ggplot2
• agricolae
• for each package, we will cover:
• short background
• general syntax
• commonly-used functions with examples (using built-in R datasets)

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


2
© CDVDelaCruz

1
04/22/2025

HORT 299/399: An Introduction to Data Analysis and


3
Visualization Using R © CDVDelaCruz

• programming language widely used for statistical computing


and analysis
• supports data manipulation, calculation, and graphical display
• developed by Ross Ihaka and Robert Gentleman at the
University of Auckland, New Zealand
• open-source and free
• cross-platform compatible
• highly-extensible

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


4
© CDVDelaCruz

2
04/22/2025

• multi-functional open-source IDE (integrated development


environment)
• graphical front-end to R (version 3.0.1 or higher)
• adapted to other programming languages (Python or SQL)
• features:
• auto-completes code
• versatile and customizable
• writing and saving reusable scripts
• operational history tracking
• exhaustive help on any subject

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


5
© CDVDelaCruz

Assignment Operator
# uses '<-' symbol to assign values to objects
x <- 5

Comments
# anything following '#' on a line is ignored by R
# allows you to include explanatory notes in your code

Vectors
vector <- c(1, 2, 3)

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


6
© CDVDelaCruz

3
04/22/2025

Data frame
df <- data.frame(Name=c("Jhoana", "Pablo"), Score=c(90, 85))

Commands and Functions


mean(df$Score)
## [1] 87.5

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


7
© CDVDelaCruz

Package Installation
# install the 'ggplot2' package
install.packages("ggplot2")
install.packages("agricolae")

Package Loading
# load the 'ggplot2' package
library(ggplot2)
library(agricolae)

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


8
© CDVDelaCruz

4
04/22/2025

Built-in Datasets
library(agricolae) # load the package containing the built-in
dataset
data(yacon) # load built-in datasets
head(yacon, 3) # displays first 3 rows of the dataset

## locality site dose entry replication height stalks wfr wff wfk roots FOS
## 1 CAJ 1 F0 P1385 1 57.9 3.2 6700 3600 5490 21.9 60.2
## 2 CAJ 1 F0 P1385 2 62.1 3.0 6450 2500 3800 21.3 60.4
## 3 CAJ 1 F0 P1385 3 53.7 2.6 7550 2450 5360 21.8 54.5
## glucose fructose sucrose brix foliage dry IH
## 1 1.74 5.0 26.66 14.8 25.6 15.6 0.3469
## 2 1.50 4.5 28.84 15.7 27.2 17.0 0.3257
## 3 2.14 9.3 27.06 14.3 30.0 15.9 0.3221

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


9
© CDVDelaCruz

Base R
# imports a csv file using a built-in function from baseR
data_csv <- read.csv("path/to/your/data.csv")
head(data_csv)

R Package
# imports an Excel file using 'readxl' package
install.packages("readxl")
library(readxl)

data_excel <- read_excel("path/to/your/data.csv")


head(data_excel)

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


10
© CDVDelaCruz

10

5
04/22/2025

ggplot2

HORT 299/399: An Introduction to Data Analysis and


11
Visualization Using R © CDVDelaCruz

11

ggplot2
• data visualization package built on the grammar of graphics
• creates complex and multi-layered graphics easily
General Syntax
ggplot(data, aes(x, y)) +
geom_<type>() +
theme_<style>() +
labs(title, x, y)

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


12
© CDVDelaCruz

12

6
04/22/2025

ggplot2

HORT 299/399: An Introduction to Data Analysis and


13
Visualization Using R © CDVDelaCruz

13

ggplot2 iris

Create the Base Plot


# create the base boxplot plot
ggplot(iris, aes(x = Species,
y = Sepal.Length)) +
geom_boxplot()

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


14
© CDVDelaCruz

14

7
04/22/2025

ggplot2 iris

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


15
© CDVDelaCruz

15

ggplot2 iris
ggplot(iris, aes(x = Species,
y = Sepal.Length,
fill = Species)) + #
boxplot colors accdng to Species
geom_boxplot() +
labs(title = "Sepal Length by
Species",
x = "Species",
y = "Sepal Length (cm)") # add
labels

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


16
© CDVDelaCruz

16

8
04/22/2025

ggplot2 iris
ggplot(iris, aes(x = Species,
y = Sepal.Length,
fill = Species)) + #
boxplot colors accdng to Species
geom_boxplot() +
labs(title = "Sepal Length by
Species",
x = "Species",
y = "Sepal Length (cm)") # add
labels

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


17
© CDVDelaCruz

17

ggplot2 iris
ggplot(iris, aes(x = Species,
y = Sepal.Length,
fill = Species)) + #
boxplot colors accdng to Species
geom_boxplot(width = 0.5) + # modify
box width
labs(title = "Sepal Length by
Species",
x = "Species",
y = "Sepal Length (cm)") # add
labels

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


18
© CDVDelaCruz

18

9
04/22/2025

ggplot2 iris
ggplot(iris, aes(x = Species,
y = Sepal.Length,
fill = Species)) + #
boxplot colors accdng to Species
geom_boxplot(width = 0.5) + # modify
box width
labs(title = "Sepal Length by
Species",
x = "Species",
y = "Sepal Length (cm)") + #add
labels
geom_jitter() # add data points

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


19
© CDVDelaCruz

19

ggplot2 iris
ggplot(iris, aes(x = Species,
y = Sepal.Length,
fill = Species)) + #
boxplot colors accdng to Species
geom_boxplot(width = 0.5) + # modify
box width
labs(title = "Sepal Length by
Species",
x = "Species",
y = "Sepal Length (cm)") + # add
labels
geom_jitter(alpha = 0.25) # add data
points, modify transparency

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


20
© CDVDelaCruz

20

10
04/22/2025

ggplot2

HORT 299/399: An Introduction to Data Analysis and


21
Visualization Using R © CDVDelaCruz

21

ggplot2 Soybean

Create the Base Plot


# create the base line graph
ggplot(soybean, aes(x = Time,
y = weight)) +
geom_line()

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


22
© CDVDelaCruz

22

11
04/22/2025

ggplot2 Soybean

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


23
© CDVDelaCruz

23

ggplot2 Soybean
ggplot(soybean, aes(x = Time,
y = weight,
color = Variety)) +
# add lines with different colors for
each variety
geom_line() +
labs(title = "Average Leaf Weight Over
Time",
x = "Days after planting (DAP)",
y = "Average Leaf Weight (g)") #
add labels

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


24
© CDVDelaCruz

24

12
04/22/2025

ggplot2 Soybean
ggplot(soybean, aes(x = Time,
y = weight,
color = Variety)) +
# add lines with different colors for
each variety
geom_point() + # add data points
geom_line() +
labs(title = "Average Leaf Weight Over
Time",
x = "Days after planting (DAP)",
y = "Average Leaf Weight (g)") #
add labels

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


25
© CDVDelaCruz

25

ggplot2 Soybean
ggplot(soybean, aes(x = Time,
y = weight,
color = Variety)) +
# add lines with different colors for
each variety
geom_point(shape=18, size=3) + # add
data points, modify points
geom_line() +
labs(title = "Average Leaf Weight Over
Time",
x = "Days after planting (DAP)",
y = "Average Leaf Weight (g)") #
add labels

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


26
© CDVDelaCruz

26

13
04/22/2025

ggplot2 Soybean
ggplot(soybean, aes(x = Time,
y = weight,
color = Variety)) +
# add lines with different colors for
each variety
geom_point(shape=18, size=3) + # add
data points, modify points
geom_line(linewidth=1.2) + # make
lines thicker
labs(title = "Average Leaf Weight Over
Time",
x = "Days after planting (DAP)",
y = "Average Leaf Weight (g)") #
add labels

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


27
© CDVDelaCruz

27

ggplot2 Soybean
ggplot(soybean, aes(x = Time,
y = weight,
linetype = Variety))
+ # add lines with different linetypes
for each variety
geom_point(shape=18, size=3) + # add
data points, modify points
geom_line(linewidth=1.2) + # make
lines thicker
labs(title = "Average Leaf Weight Over
Time",
x = "Days after planting (DAP)",
y = "Average Leaf Weight (g)") #
add labels

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


28
© CDVDelaCruz

28

14
04/22/2025

ggplot2 Soybean
ggplot(soybean, aes(x = Time,
y = weight,
linetype = Variety))
+ # add lines with different linetypes
for each variety
geom_point(shape=18, size=3) + # add
data points, modify points
geom_line(linewidth=1.2) + # make
lines thicker
scale_linetype_manual(values =
c("dotdash", "solid")) + # specify line
types
labs(title = "Average Leaf Weight Over
Time",
x = "Days after planting (DAP)",
y = "Average Leaf Weight (g)") #
add labels

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


29
© CDVDelaCruz

29

ggplot2

HORT 299/399: An Introduction to Data Analysis and


30
Visualization Using R © CDVDelaCruz

30

15
04/22/2025

ggplot2 yacon

Create the Base Plot


# create the base scatter plot
ggplot(yacon, aes(x = stalks,
y = FOS)) +
geom_point()

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


31
© CDVDelaCruz

31

ggplot2 yacon

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


32
© CDVDelaCruz

32

16
04/22/2025

ggplot2 yacon
ggplot(yacon, aes(x = stalks,
y = FOS)) +
geom_point() +
labs(title = "FOS vs Stalk Count",
x = "Number of Stalks",
y = "FOS (%)") # add labels

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


33
© CDVDelaCruz

33

ggplot2 yacon
ggplot(yacon, aes(x = stalks,
y = FOS)) +
geom_point(color = "darkgreen") + #
change color of points
labs(title = "FOS vs Stalk Count",
x = "Number of Stalks",
y = "FOS (%)") # add labels

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


34
© CDVDelaCruz

34

17
04/22/2025

ggplot2 yacon
ggplot(yacon, aes(x = stalks,
y = FOS,
color = entry)) + #
color by grouping variable (entry)
geom_point() +
labs(title = "FOS vs Stalk Count",
x = "Number of Stalks",
y = "FOS (%)") # add labels

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


35
© CDVDelaCruz

35

ggplot2 yacon
ggplot(yacon, aes(x = stalks,
y = FOS,
color = entry)) + #
color by grouping variable (entry)
geom_point() +
labs(title = "FOS vs Stalk Count",
x = "Number of Stalks",
y = "FOS (%)") + # add labels
scale_color_manual(values =
c("maroon", "forestgreen", "sienna",
"tomato", "steelblue", "mediumorchid",
"rosybrown", "orange"))

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


36
© CDVDelaCruz

36

18
04/22/2025

ggplot2 yacon
# load required package
library(RColorBrewer)

ggplot(yacon, aes(x = stalks,


y = FOS,
color = entry)) + #
color by grouping variable (entry)
geom_point() +
labs(title = "FOS vs Stalk Count",
x = "Number of Stalks",
y = "FOS (%)") + # add labels
scale_color_brewer(palette = "Dark2")
# use a pre-defined color palette from
RColorBrewer

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


37
© CDVDelaCruz

37

ggplot2

HORT 299/399: An Introduction to Data Analysis and


38
Visualization Using R © CDVDelaCruz

38

19
04/22/2025

ggplot2
library(RColorBrewer)
ggplot(yacon, aes(x = stalks,
y = FOS,
color = entry)) + #
color by grouping variable (entry)
geom_point() +
labs(title = "FOS vs Stalk Count",
x = "Number of Stalks",
y = "FOS (%)") + # add labels
scale_color_brewer(palette = "Dark2")
+
theme_classic() # apply classic theme

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


39
© CDVDelaCruz

39

ggplot2
library(RColorBrewer)
ggplot(yacon, aes(x = stalks,
y = FOS,
color = entry)) + #
color by grouping variable (entry)
geom_point() +
labs(title = "FOS vs Stalk Count",
x = "Number of Stalks",
y = "FOS (%)") + # add labels
scale_color_brewer(palette = "Dark2")
+
theme_light() # apply light theme

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


40
© CDVDelaCruz

40

20
04/22/2025

ggplot2
library(RColorBrewer)
ggplot(yacon, aes(x = stalks,
y = FOS,
color = entry)) + #
color by grouping variable (entry)
geom_point() +
labs(title = "FOS vs Stalk Count",
x = "Number of Stalks",
y = "FOS (%)") + # add labels
scale_color_brewer(palette = "Dark2")
+
theme_minimal() # apply minimal theme

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


41
© CDVDelaCruz

41

ggplot2
library(RColorBrewer)
ggplot(yacon, aes(x = stalks,
y = FOS,
color = entry)) + #
color by grouping variable (entry)
geom_point() +
labs(title = "FOS vs Stalk Count",
x = "Number of Stalks",
y = "FOS (%)") + # add labels
scale_color_brewer(palette = "Dark2")
+
theme_bw() # apply classic theme

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


42
© CDVDelaCruz

42

21
04/22/2025

ggplot2
• useful for creating multi-panel plots by grouping data
• uses the functions:
facet_wrap()
facet_grid()

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


43
© CDVDelaCruz

43

ggplot2 facet_wrap()
ggplot(yacon, aes(x = stalks,
y = FOS,
color = entry)) +
geom_point() +
labs(title = "FOS vs Stalk Count",
x = "Number of Stalks",
y = "FOS (%)") +
scale_color_brewer(palette = "Dark2")
+
theme_bw() +
facet_wrap(~locality) # creates a
panel for each location

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


44
© CDVDelaCruz

44

22
04/22/2025

ggplot2 facet_wrap()
ggplot(yacon, aes(x = stalks,
y = FOS,
color = entry)) +
geom_point() +
labs(title = "FOS vs Stalk Count",
x = "Number of Stalks",
y = "FOS (%)") +
scale_color_brewer(palette = "Dark2")
+
theme_bw() +
facet_wrap(~locality, ncol=1) # stacks
panels in a column

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


45
© CDVDelaCruz

45

ggplot2 facet_grid
ggplot(yacon, aes(x = stalks,
y = FOS,
color = entry)) +
geom_point() +
labs(title = "FOS vs Stalk Count",
x = "Number of Stalks",
y = "FOS (%)") +
scale_color_brewer(palette = "Dark2")
+
theme_bw() +
facet_grid(dose~locality) # creates a
panel for each dose x location

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


46
© CDVDelaCruz

46

23
04/22/2025

agricolae

HORT 299/399: An Introduction to Data Analysis and


47
Visualization Using R © CDVDelaCruz

47

agricolae
• provides functions for experimental design and analysis of
agricultural experiments
• planning of field experiments
General Syntax
design.<design_type>(trt, r, serie, seed)

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


48
© CDVDelaCruz

48

24
04/22/2025

agricolae
# create a vector defining your treatments
treatment <- c("A", "B", "C")
# create CRD layout with 10 replicates per treatment level
design_crd <- design.crd(treatment, r=10, seed=123, serie=2)
# extract the design book
book_crd <- design_crd$book
# view first 3 rows of the design book
head(book_crd, 3)

## plots r treatment
## 1 101 1 C
## 2 102 1 A
## 3 103 2 A

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


49
© CDVDelaCruz

49

agricolae
# create a vector defining your treatments
treatment <- c("A", "B", "C", "D", "E")

# set number of replicates


rep = 4

# create RCBD layout


design_rcbd <- design.rcbd(treatment, rep, seed=123, serie=2)

# apply zigzag layout


book_rcbd<- zigzag(design_rcbd) #serpentine layout

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


50
© CDVDelaCruz

50

25
04/22/2025

agricolae
# display RCBD layout
print(design_rcbd$sketch)
## [,1] [,2] [,3] [,4] [,5]
## [1,] "A" "C" "D" "E" "B"
## [2,] "B" "A" "E" "C" "D"
## [3,] "A" "D" "B" "E" "C"
## [4,] "E" "C" "D" "B" "A"
# display plot numbers
print(matrix(book_rcbd[,1],byrow = TRUE, ncol = 5))
## [,1] [,2] [,3] [,4] [,5]
## [1,] 101 102 103 104 105
## [2,] 205 204 203 202 201
## [3,] 301 302 303 304 305
## [4,] 405 404 403 402 401
# save the design to a csv file
write.csv(book_rcbd, "book_rcbd.csv", row.names=FALSE)

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


51
© CDVDelaCruz

51

agricolae
# create a vector for your treatments
treatment <- c("A", "B", "C", "D")

# generate LSD layout using specified treatments


design_lsd <- design.lsd(treatment, seed=123, serie=2)

# display the LSD layout


print(design_lsd$sketch)
## [,1] [,2] [,3] [,4]
## [1,] "C" "A" "B" "D"
## [2,] "B" "D" "A" "C"
## [3,] "D" "B" "C" "A"
## [4,] "A" "C" "D" "B"

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


52
© CDVDelaCruz

52

26
04/22/2025

agricolae
# apply zigzag plot numbering to the design layout
book_lsd <- zigzag(design_lsd)

# display the plot layout in a 4-column matrix


print(matrix(book_lsd[,1],byrow = TRUE, ncol = 4))
## [,1] [,2] [,3] [,4]
## [1,] 101 102 103 104
## [2,] 204 203 202 201
## [3,] 301 302 303 304
## [4,] 404 403 402 401
# save the design to a CSV file
write.csv(book_lsd, "book_rcbd.csv", row.names=FALSE)

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


53
© CDVDelaCruz

53

agricolae
• design.ab(): factorial experiments
• design.split(): split-plot experiments
• design.strip(): strip-plot experiments
• design.alpha(): alpha-lattice experiments

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


54
© CDVDelaCruz

54

27
04/22/2025

agricolae
data(yacon) # load the built-in dataset
head(yacon, 3) # display first 3 rows
## entry replication roots
## 1 P1385 1 21.9
## 2 P1385 2 21.3
## 3 P1385 3 21.8

# declare categorical variables as factors


yacon$entry <- as.factor(yacon$entry)
yacon$block <- as.factor(yacon$replication)

# define your model


model <- aov(roots ~ block + entry, data = yacon)

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


55
© CDVDelaCruz

55

agricolae
# display anova table
summary(model)

## Df Sum Sq Mean Sq F value Pr(>F)


## block 2 1.18 0.590 1.747 0.21
## entry 7 48.37 6.909 20.440 2.47e-06 ***
## Residuals 14 4.73 0.338
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


56
© CDVDelaCruz

56

28
04/22/2025

agricolae
# perform Tukey HSD test
hsd_result <- HSD.test(model, "entry", group = TRUE)

# view groupings
print(hsd_result$groups)

## roots groups
## AMM5150 23.03333 a
## AKW5075 22.43333 a
## AMM5163 22.16667 ab
## P1385 21.66667 ab
## AMM5136 20.73333 bc
## ARB5125 19.63333 c
## SAL136 19.43333 c
## CLLUNC118 19.06667 c

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


57
© CDVDelaCruz

57

HORT 299/399: An Introduction to Data Analysis and


58
Visualization Using R © CDVDelaCruz

58

29
04/22/2025

• Use the function and help operator


help(ggplot)
?geom_boxplot
??geom_point
• Refer to package documentations
• Data visualization: https://r-graph-gallery.com/

HORT 299/399: An Introduction to Data Analysis and Visualization Using R


59
© CDVDelaCruz

59

HORT 299/399 Special Seminar

An Introduction to
Data Analysis &
Visualization Using R

60

30

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy