0% found this document useful (0 votes)

4 views5 pages

Lab3Instructions Knitr

This lab focuses on reading data into R and performing descriptive statistics, specifically using a dataset on bat brain sizes. It covers setting a working directory, reading CSV files, creating histograms, calculating mean, median, and standard deviation, and producing boxplots and scatter plots. The lab emphasizes the importance of visualizing data and understanding the distribution of variables within different bat families.

Uploaded by

Jai Calatrava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views5 pages

Lab3Instructions Knitr

Uploaded by

Jai Calatrava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Lab #3

2024-09-16

In this lab we are going to learn how to read data into R and perform some descriptive statistics.
First, set a working directory. You can also set this using the menu in R-Studio.
Session>Set Working Directory>Choose Directory. . . Note that your file will be invisible - navigate to the
folder it resides within and hit ‘Open’
It will post something similar to the line below in your console.

setwd("~/Library/CloudStorage/OneDrive-DePaulUniversity/DePaul/Teaching/2025WQ/BIO206/Labs/Day_4")

Is my file in the directory I just selected?

list.files()

## [1] "batbrains.csv" "batbrainsFull.csv"

## [3] "Day_4_Script.R" "Lab3Instructions_Knitr.pdf"
## [5] "Lab3Instructions_Knitr.Rmd" "LabWorksheet_3_Complete.docx"
## [7] "LabWorksheet_3_KEY.pdf" "LabWorksheet_3.docx"
## [9] "Worksheet_Hist.pdf" "Worksheet_scatter.pdf"

The data is in CSV format (comma separated value). You can see the commas if you open this file in a text
editor.
Excel cannot save any plots in this format. It will only save the text data.
Read in our data and name it something meaningful. Here I have named it BatData - BatData is now an
object in R.

BatData<-read.csv("batbrains.csv")

Let’s examine some data and plot a grouped frequency distribution. We’ll begin with a histogram.
In the code chunk below, the dollar sign allows us to access a variable directly. You can also using indexing
(i.e., X[,3]) to access a variable. In R-Studio it will give you options that you can click as a shortcut. We
add two other arguments separated by commas, xlab and main.
xlab lets us change the axis labels.
main is the title, I set it as NULL so it removes it.
What does the distribution look like?

hist(BatData$brain_size, xlab="Brain Size (mm3)", main = NULL)

1
12
10
8
Frequency

6
4
2
0

100 200 300 400 500 600 700 800

Brain Size (mm3)

What if I want to see separate histograms for each bat family? I can achieve this using the subset function.
The subset function will create two new data frames that I have named Hip and Mol based on the names of
the bat families.

Hip<-subset(BatData, subset = BatData$family == "Hipposideridae")

Mol<-subset(BatData, subset = BatData$family == "Molossidae")

The subset function uses a logical statement (using the == symbol), to ask the data to return things that
are either true or false.
It provides all the data for which the family column entry reads ‘Hipposideridae’ or ‘Molossidae’ in the case
of the second line.
I also add some formatting to the plots using the par function. The pty argument allows me to make the
plot square, and the mfrow argument allows me to define how I want the two plots arranged. In this case I
say give me 1 row and 2 columns. Note that mfrow needs two values, hence why I use the c function to join
the two values together in a vector.

par(pty='s', mfrow=c(1,2))
hist(Hip$brain_size, xlab="Hipposideridae Brain Size (mm3)", main = NULL)
hist(Mol$brain_size, xlab="Molossidae Brain Size (mm3)", main = NULL)

2
8

5
4
6
Frequency

Frequency

3
4

2
2

1
0

0
100 300 500 700 100 300 500 700

Hipposideridae Brain Size (mm3) Molossidae Brain Size (mm3)

Let’s calculate some other descriptive statistics.

Let’s begin with the mean, median, and standard deviation. Note that the mean and median are quite
different as measures of central tendency, indicating there is likely skew in the data.

mean(BatData$brain_size)

## [1] 387.3432

median(BatData$brain_size)

## [1] 369.6

sd(BatData$brain_size)

## [1] 183.1476

But, what if I wanted a mean for a given bat family. I can use the aggregate function.
First, you join all the continuous variables together that you’re interested in using the function cbind. Then
you tell it which categorical variable you want to find the mean/median/sd for, in this case, family.
FUN in this case means the ‘function’ we wish to apply.

3
aggregate(x = cbind(brain_size,amygdala,hippocampus)~family, FUN="mean", data = BatData)

## family brain_size amygdala hippocampus

## 1 Hipposideridae 380.8600 15.22500 27.04500
## 2 Molossidae 394.9706 19.32941 20.00588

aggregate(x = cbind(brain_size,amygdala,hippocampus)~family, FUN="median", data = BatData)

## family brain_size amygdala hippocampus

## 1 Hipposideridae 279.05 11.75 19.65
## 2 Molossidae 391.80 19.50 20.80

aggregate(x = cbind(brain_size,amygdala,hippocampus)~family, FUN="sd", data = BatData)

## family brain_size amygdala hippocampus

## 1 Hipposideridae 213.3609 8.729857 14.930170
## 2 Molossidae 145.9420 6.037256 6.020742

Boxplots are a great way to illustrate a continuous variable grouped by a discrete variable. They show you
the range (excluding outliers), the interquartile range, and the median of your data. You can see how your
continuous data is distributed.
The general formula for producing a boxplot is as follows: boxplot(continuous~categorical). Or, in other
words, boxplot(Dependent~Independent). You give the boxplot function your whole data frame (data =
BatData), so you do not need to use the $ here.

par(pty='s')
boxplot(hippocampus~family,data = BatData, xlab = "Family", ylab = "Brain Size (mm3)")
50
Brain Size (mm3)

40
30
20
10

Hipposideridae Molossidae

Family

Let’s produce a scatter plot with two continuous variables. I can produce my plot in two different ways the
first involves the use of the ~ symbol. Here, you place the y variable (dependent) first, then x.

4
par(pty='s')
plot(amygdala~brain_size, data = BatData,
xlab = "Brain Size (mm3)", ylab = "Amygdala (mm3)")

30
25
Amygdala (mm3)

20
15
10
5

200 400 600

Brain Size (mm3)

You can also explicitly define x and y. However, you must use the $ sign to tell R exactly what you’re doing.
Note that you can get superscript characters, it’s just a really complex function so I wanted to introduce it
at the very end.

par(pty='s')
plot(x = BatData$brain_size, y = BatData$amygdala,
xlab = expression('Brain Size (mm'ˆ3*')'), ylab = expression('Amygdala (mm'ˆ3*')'))
30
25
Amygdala (mm3)

20
15
10
5

200 400 600

Brain Size (mm3)

Essentials of Educational Psychology Big Ideas To Guide Effective Teaching 5th Edition Test Bank Available Instantly
No ratings yet
Essentials of Educational Psychology Big Ideas To Guide Effective Teaching 5th Edition Test Bank Available Instantly
411 pages
Visual Statistics Use R
No ratings yet
Visual Statistics Use R
451 pages
2014 DENSO Fuel Pump and Fuel Injector Catalog PDF
100% (2)
2014 DENSO Fuel Pump and Fuel Injector Catalog PDF
308 pages
Applied Statistics For Bioinformatics Using R
100% (2)
Applied Statistics For Bioinformatics Using R
279 pages
Final JSA
No ratings yet
Final JSA
93 pages
Fadhilah Putri Turnitin Pertama Revisi - Dwi Fadhilah Putri
No ratings yet
Fadhilah Putri Turnitin Pertama Revisi - Dwi Fadhilah Putri
67 pages
Genetica Cuantitativa
No ratings yet
Genetica Cuantitativa
120 pages
Applied Statistics For Bioinformatics PDF
No ratings yet
Applied Statistics For Bioinformatics PDF
278 pages
A. I. in Healthcare
100% (1)
A. I. in Healthcare
14 pages
R Module 6 - Data Summarization
No ratings yet
R Module 6 - Data Summarization
25 pages
Shipunov Visual Statistics
No ratings yet
Shipunov Visual Statistics
429 pages
Lecture 1
No ratings yet
Lecture 1
167 pages
Basic Stats For Ecology
No ratings yet
Basic Stats For Ecology
26 pages
Solutions Manual To Accompany Investment 8th Edition 9780073382371
100% (1)
Solutions Manual To Accompany Investment 8th Edition 9780073382371
19 pages
R Presentation
No ratings yet
R Presentation
38 pages
Catalogo Completo Ymm 2019 Pag 201-246
No ratings yet
Catalogo Completo Ymm 2019 Pag 201-246
46 pages
Shahun Term Workr1
No ratings yet
Shahun Term Workr1
34 pages
Rise City: Church Brand Guidelines
No ratings yet
Rise City: Church Brand Guidelines
29 pages
Statistical Inference Lab4
No ratings yet
Statistical Inference Lab4
32 pages
The Architecture of The Sanctuary of Apollo Hylates at Kourion
No ratings yet
The Architecture of The Sanctuary of Apollo Hylates at Kourion
84 pages
Question 3
No ratings yet
Question 3
41 pages
DSR LAB MANUAL - 10 Programs
No ratings yet
DSR LAB MANUAL - 10 Programs
34 pages
Word File For Prob and Stats
No ratings yet
Word File For Prob and Stats
25 pages
Trans - Command-Line Translator Using Google Translate, Bing Translator, Yandex - Translate, Etc. - Translate-Shell Commands - Man Pages - ManKier
No ratings yet
Trans - Command-Line Translator Using Google Translate, Bing Translator, Yandex - Translate, Etc. - Translate-Shell Commands - Man Pages - ManKier
9 pages
R Programming
No ratings yet
R Programming
4 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Module 4 - v1
No ratings yet
Module 4 - v1
84 pages
Da Lab File
No ratings yet
Da Lab File
33 pages
Resume Word Origin
100% (1)
Resume Word Origin
5 pages
R Practice
No ratings yet
R Practice
38 pages
Krijnen IntroBioInfStatistics
No ratings yet
Krijnen IntroBioInfStatistics
278 pages
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
No ratings yet
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
27 pages
Karan Parmar BBA (MS) Section-A - R-Programming Assignment
No ratings yet
Karan Parmar BBA (MS) Section-A - R-Programming Assignment
21 pages
Torin Manual Freno Adicional JLQ20-20190820
No ratings yet
Torin Manual Freno Adicional JLQ20-20190820
23 pages
R File Code
No ratings yet
R File Code
16 pages
Eur Sam Corridor - v50510
0% (1)
Eur Sam Corridor - v50510
1 page
Antenna Lesson Plan New
No ratings yet
Antenna Lesson Plan New
2 pages
Rintro
No ratings yet
Rintro
42 pages
Unit 1 Assignment SKELETON R spr18
No ratings yet
Unit 1 Assignment SKELETON R spr18
23 pages
PI Time PDF
No ratings yet
PI Time PDF
5 pages
CBC Cams
No ratings yet
CBC Cams
2 pages
Summarizing Data
No ratings yet
Summarizing Data
13 pages
Chapter 3 - STAT1204..
No ratings yet
Chapter 3 - STAT1204..
10 pages
R Practical
No ratings yet
R Practical
9 pages
Stata
No ratings yet
Stata
26 pages
AE 5332 - Professor Dora Elia Musielak: Residue Theorem and Solution of Real Indefinite Integrals
No ratings yet
AE 5332 - Professor Dora Elia Musielak: Residue Theorem and Solution of Real Indefinite Integrals
13 pages
Ex Day1
No ratings yet
Ex Day1
9 pages
Chapter - 3 Common Statistical Procedure
No ratings yet
Chapter - 3 Common Statistical Procedure
20 pages
G9SE Brochure
No ratings yet
G9SE Brochure
9 pages
Technical Data Sheet 958in 40-47# Sh-2 Packer, 3.5in Eue B-P, Mesh, 350 Deg F, Hydraulic Set, Retrievab
No ratings yet
Technical Data Sheet 958in 40-47# Sh-2 Packer, 3.5in Eue B-P, Mesh, 350 Deg F, Hydraulic Set, Retrievab
7 pages
Using R For Data Preprocessing, Exploratory Analysis, Visualization
No ratings yet
Using R For Data Preprocessing, Exploratory Analysis, Visualization
7 pages
Basic R Commands For Data Analysis
No ratings yet
Basic R Commands For Data Analysis
7 pages
BM1, Applied Statistics, Lesson 1: Data and Graph Basics: Luis Del Peso Ovalle
No ratings yet
BM1, Applied Statistics, Lesson 1: Data and Graph Basics: Luis Del Peso Ovalle
17 pages
R Language - Experiment 1 (21-01-25)
No ratings yet
R Language - Experiment 1 (21-01-25)
8 pages
Lesson 1
No ratings yet
Lesson 1
18 pages
String Functions: Extract 1st Word From String "Name"
No ratings yet
String Functions: Extract 1st Word From String "Name"
28 pages
DevRes wk1-2
No ratings yet
DevRes wk1-2
6 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
Lab4Instructions Knitr
No ratings yet
Lab4Instructions Knitr
5 pages
R Commands Good
No ratings yet
R Commands Good
2 pages
Cheatsheet
No ratings yet
Cheatsheet
5 pages
Lab 1 - Basic Functions in R and Plotting
No ratings yet
Lab 1 - Basic Functions in R and Plotting
8 pages
The Power of Content Audits
No ratings yet
The Power of Content Audits
4 pages
Stats Worksheets
No ratings yet
Stats Worksheets
14 pages
Intro To R Software
No ratings yet
Intro To R Software
7 pages
F24 Lab-01
No ratings yet
F24 Lab-01
4 pages
cASE rEPORT No 13 Dan 14 CH 8 OPERATION MANAGEMENT (CHASE & JACOBS) 2018
No ratings yet
cASE rEPORT No 13 Dan 14 CH 8 OPERATION MANAGEMENT (CHASE & JACOBS) 2018
5 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
Process: Subprocess Popen
No ratings yet
Process: Subprocess Popen
10 pages
What Is Steam Tracing
No ratings yet
What Is Steam Tracing
4 pages
R Studio Lab Summary Sheet
No ratings yet
R Studio Lab Summary Sheet
3 pages
Log Sheet Eng 2020
No ratings yet
Log Sheet Eng 2020
3 pages
ch-7 R (A)
No ratings yet
ch-7 R (A)
3 pages
Lab0 R Tutorial EHS
No ratings yet
Lab0 R Tutorial EHS
9 pages
BAN5
No ratings yet
BAN5
2 pages
6 FM 12
No ratings yet
6 FM 12
2 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
FM-IMS-GR-050 Supervised Induction Module - CONTROLLED
No ratings yet
FM-IMS-GR-050 Supervised Induction Module - CONTROLLED
2 pages
R
No ratings yet
R
4 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
R Commands
No ratings yet
R Commands
5 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
Color Block UIUX Designer Resume
No ratings yet
Color Block UIUX Designer Resume
1 page
UL2
No ratings yet
UL2
2 pages
Basics: TH TH TH TH TH TH TH
No ratings yet
Basics: TH TH TH TH TH TH TH
3 pages
Tutprac 1
No ratings yet
Tutprac 1
8 pages
Data Science Essentials For Dummies
From Everand
Data Science Essentials For Dummies
Lillian Pierson
No ratings yet
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet
Excel Simulations
From Everand
Excel Simulations
Gerard M. Verschuuren
3.5/5 (2)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lab3Instructions Knitr

Uploaded by

Lab3Instructions Knitr

Uploaded by

Lab #3

Is my file in the directory I just selected?

## [1] "batbrains.csv" "batbrainsFull.csv"

hist(BatData$brain_size, xlab="Brain Size (mm3)", main = NULL)

100 200 300 400 500 600 700 800

Brain Size (mm3)

Hip<-subset(BatData, subset = BatData$family == "Hipposideridae")

Hipposideridae Brain Size (mm3) Molossidae Brain Size (mm3)

Let’s calculate some other descriptive statistics.

## family brain_size amygdala hippocampus

aggregate(x = cbind(brain_size,amygdala,hippocampus)~family, FUN="median", data = BatData)

## family brain_size amygdala hippocampus

aggregate(x = cbind(brain_size,amygdala,hippocampus)~family, FUN="sd", data = BatData)

## family brain_size amygdala hippocampus

200 400 600

Brain Size (mm3)

200 400 600

Brain Size (mm3)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.