0% found this document useful (0 votes)

21 views5 pages

Divvy Exercise R Script

The document outlines a data analysis process using R to wrangle and analyze Divvy bike trip data from Q1 2019 and Q1 2020. It includes steps for data collection, cleaning, and preparation, such as renaming columns for consistency, removing unnecessary data, and calculating ride lengths. The final analysis involves descriptive statistics and visualizations of ride data by user type and day of the week, culminating in exporting the summarized data for further analysis.

Uploaded by

pboss16.pp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views5 pages

Divvy Exercise R Script

Uploaded by

pboss16.pp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

library(tidyverse) #helps wrangle data

# Use the conflicted package to manage conflicts

library(conflicted)

# Set dplyr::filter and dplyr::lag as the default choices

conflict_prefer("filter", "dplyr")
conflict_prefer("lag", "dplyr")

#=====================
# STEP 1: COLLECT DATA
#=====================
# # Upload Divvy datasets (csv files) here
q1_2019 <- read_csv("Divvy_Trips_2019_Q1.csv")
q1_2020 <- read_csv("Divvy_Trips_2020_Q1.csv")

#====================================================
# STEP 2: WRANGLE DATA AND COMBINE INTO A SINGLE FILE
#====================================================
# Compare column names each of the files
# While the names don't have to be in the same order, they DO need to match perfectly before
we can use a command to join them into one file
colnames(q1_2019)
colnames(q1_2020)

# Rename columns to make them consistent with q1_2020 (as this will be the supposed
going-forward table design for Divvy)

(q1_2019 <- rename(q1_2019

,ride_id = trip_id
,rideable_type = bikeid
,started_at = start_time
,ended_at = end_time
,start_station_name = from_station_name
,start_station_id = from_station_id
,end_station_name = to_station_name
,end_station_id = to_station_id
,member_casual = usertype
))

# Inspect the dataframes and look for incongruencies

str(q1_2019)
str(q1_2020)
# Convert ride_id and rideable_type to character so that they can stack correctly
q1_2019 <- mutate(q1_2019, ride_id = as.character(ride_id)
,rideable_type = as.character(rideable_type))

# Stack individual quarter's data frames into one big data frame
all_trips <- bind_rows(q1_2019, q1_2020)#, q3_2019)#, q4_2019, q1_2020)

# Remove lat, long, birthyear, and gender fields as this data was dropped beginning in 2020
all_trips <- all_trips %>%
select(-c(start_lat, start_lng, end_lat, end_lng, birthyear, gender, "tripduration"))

#======================================================
# STEP 3: CLEAN UP AND ADD DATA TO PREPARE FOR ANALYSIS
#======================================================
# Inspect the new table that has been created
colnames(all_trips) #List of column names
nrow(all_trips) #How many rows are in data frame?
dim(all_trips) #Dimensions of the data frame?
head(all_trips) #See the first 6 rows of data frame. Also tail(all_trips)
str(all_trips) #See list of columns and data types (numeric, character, etc)
summary(all_trips) #Statistical summary of data. Mainly for numerics

# There are a few problems we will need to fix:

# (1) In the "member_casual" column, there are two names for members ("member" and
"Subscriber") and two names for casual riders ("Customer" and "casual"). We will need to
consolidate that from four to two labels.
# (2) The data can only be aggregated at the ride-level, which is too granular. We will want to
add some additional columns of data -- such as day, month, year -- that provide additional
opportunities to aggregate the data.
# (3) We will want to add a calculated field for length of ride since the 2020Q1 data did not have
the "tripduration" column. We will add "ride_length" to the entire dataframe for consistency.
# (4) There are some rides where tripduration shows up as negative, including several hundred
rides where Divvy took bikes out of circulation for Quality Control reasons. We will want to
delete these rides.

# In the "member_casual" column, replace "Subscriber" with "member" and "Customer" with
"casual"
# Before 2020, Divvy used different labels for these two types of riders ... we will want to make
our dataframe consistent with their current nomenclature
# N.B.: "Level" is a special property of a column that is retained even if a subset does not
contain any values from a specific level
# Begin by seeing how many observations fall under each usertype
table(all_trips$member_casual)

# Reassign to the desired values (we will go with the current 2020 labels)
all_trips <- all_trips %>%
mutate(member_casual = recode(member_casual
,"Subscriber" = "member"
,"Customer" = "casual"))

# Check to make sure the proper number of observations were reassigned

table(all_trips$member_casual)

# Add columns that list the date, month, day, and year of each ride
# This will allow us to aggregate ride data for each month, day, or year ... before completing
these operations we could only aggregate at the ride level
# https://www.statmethods.net/input/dates.html more on date formats in R found at that link
all_trips$date <- as.Date(all_trips$started_at) #The default format is yyyy-mm-dd
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")

# Add a "ride_length" calculation to all_trips (in seconds)

# https://stat.ethz.ch/R-manual/R-devel/library/base/html/difftime.html
all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)

# Inspect the structure of the columns

str(all_trips)

# Convert "ride_length" from Factor to numeric so we can run calculations on the data
is.factor(all_trips$ride_length)
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
is.numeric(all_trips$ride_length)

# Remove "bad" data

# The dataframe includes a few hundred entries when bikes were taken out of docks and
checked for quality by Divvy or ride_length was negative
# We will create a new version of the dataframe (v2) since data is being removed
# https://www.datasciencemadesimple.com/delete-or-drop-rows-in-r-with-conditions-2/
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]

#=====================================
# STEP 4: CONDUCT DESCRIPTIVE ANALYSIS
#=====================================
# Descriptive analysis on ride_length (all figures in seconds)
mean(all_trips_v2$ride_length) #straight average (total ride length / rides)
median(all_trips_v2$ride_length) #midpoint number in the ascending array of ride lengths
max(all_trips_v2$ride_length) #longest ride
min(all_trips_v2$ride_length) #shortest ride

# You can condense the four lines above to one line using summary() on the specific attribute
summary(all_trips_v2$ride_length)

# Compare members and casual users

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)

# See the average ride time by each day for members vs casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week,
FUN = mean)

# Notice that the days of the week are out of order. Let's fix that.
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels=c("Sunday", "Monday",
"Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

# Now, let's run the average ride time by each day for members vs casual users
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week,
FUN = mean)

# analyze ridership data by type and weekday

all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>% #creates weekday field using
wday()
group_by(member_casual, weekday) %>% #groups by usertype and weekday
summarise(number_of_rides = n() #calculates
the number of rides and average duration
,average_duration = mean(ride_length)) %>% # calculates the average
duration
arrange(member_casual, weekday) # sorts

# Let's visualize the number of rides by rider type

all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, weekday) %>%
ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
geom_col(position = "dodge")

# Let's create a visualization for average duration

all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, weekday) %>%
ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
geom_col(position = "dodge")

#=================================================
# STEP 5: EXPORT SUMMARY FILE FOR FURTHER ANALYSIS
#=================================================
# Create a csv file that we will visualize in Excel, Tableau, or my presentation software
# N.B.: This file location is for a Mac. If you are working on a PC, change the file location
accordingly (most likely "C:\Users\YOUR_USERNAME\Desktop\...") to export the data. You can
read more here: https://datatofish.com/export-dataframe-to-csv-in-r/
counts <- aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual +
all_trips_v2$day_of_week, FUN = mean)
write.csv(counts, file = 'avg_ride_length.csv')

Counter Descriptions MINI-LINK 6600 and MINI-LINK 6366
No ratings yet
Counter Descriptions MINI-LINK 6600 and MINI-LINK 6366
111 pages
Setting Up Google Search Console
100% (1)
Setting Up Google Search Console
6 pages
Verzani Answers
100% (8)
Verzani Answers
94 pages
Chapter 1: Introduction To Database Management System
No ratings yet
Chapter 1: Introduction To Database Management System
47 pages
Simulation Using ProModel
No ratings yet
Simulation Using ProModel
10 pages
Tutorial DOGBOT
No ratings yet
Tutorial DOGBOT
135 pages
Analyzing Taxi Trends
No ratings yet
Analyzing Taxi Trends
43 pages
Download Complete The Outsourcer The Story of India s IT Revolution 1st Edition Dinesh C. Sharma PDF for All Chapters
No ratings yet
Download Complete The Outsourcer The Story of India s IT Revolution 1st Edition Dinesh C. Sharma PDF for All Chapters
51 pages
Manipulating Data in R
No ratings yet
Manipulating Data in R
32 pages
Application Manual: Conveyor Tracking
No ratings yet
Application Manual: Conveyor Tracking
148 pages
N2OS UserManual
No ratings yet
N2OS UserManual
78 pages
hashcat_log1131234132424
No ratings yet
hashcat_log1131234132424
87 pages
Regression
No ratings yet
Regression
36 pages
Machine Learning Assignment Report - Cars
100% (4)
Machine Learning Assignment Report - Cars
42 pages
Bike Renting PDF
No ratings yet
Bike Renting PDF
26 pages
Project Cyclistic Bike Share Analysis
No ratings yet
Project Cyclistic Bike Share Analysis
1 page
Activité Language R lesson changing solution
No ratings yet
Activité Language R lesson changing solution
5 pages
Bellabeat R Script Template
No ratings yet
Bellabeat R Script Template
4 pages
Lab 5 Privilage Exploitation With Sudo
No ratings yet
Lab 5 Privilage Exploitation With Sudo
4 pages
AnalysisReport
No ratings yet
AnalysisReport
54 pages
Case Study 1 Exercise R Script
No ratings yet
Case Study 1 Exercise R Script
5 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
Check Data Types and Data Structures For All The Data Frames - Sapply (Tripdata - 202307, Class) To Sapply (Tripdata - 202406, Class)
No ratings yet
Check Data Types and Data Structures For All The Data Frames - Sapply (Tripdata - 202307, Class) To Sapply (Tripdata - 202406, Class)
9 pages
P3.-Check Data Structures and Data Types For All The Data Frames - Sapply (Tripdata - 202307, Class) To Sapply (Tripdata - 202312, Class)
No ratings yet
P3.-Check Data Structures and Data Types For All The Data Frames - Sapply (Tripdata - 202307, Class) To Sapply (Tripdata - 202312, Class)
9 pages
Tài liệu không có tiêu đề (1)
No ratings yet
Tài liệu không có tiêu đề (1)
7 pages
Report of BDA mini Project
No ratings yet
Report of BDA mini Project
11 pages
lec49
No ratings yet
lec49
17 pages
Case Study - How Does A Bike-Share Navigate Speedy Success - Gloria Busungu
No ratings yet
Case Study - How Does A Bike-Share Navigate Speedy Success - Gloria Busungu
10 pages
Quick Guide To Data Cleaning With Examples - Sunscrapers
No ratings yet
Quick Guide To Data Cleaning With Examples - Sunscrapers
11 pages
R Sharing
No ratings yet
R Sharing
16 pages
2024 04 18 - Log
No ratings yet
2024 04 18 - Log
15 pages
Week 13 Instructions
No ratings yet
Week 13 Instructions
9 pages
Data analysis with R
No ratings yet
Data analysis with R
72 pages
PROJECT - Computer Science Project - NAME - Vivek Yadav - SECTION - F - Vivek Yadav
No ratings yet
PROJECT - Computer Science Project - NAME - Vivek Yadav - SECTION - F - Vivek Yadav
21 pages
Visualizing Big Data With Trelliscope
No ratings yet
Visualizing Big Data With Trelliscope
7 pages
Arrays in C#: Size of The Array - 1
No ratings yet
Arrays in C#: Size of The Array - 1
40 pages
Aman_Babu_s_Resume__Copy_
No ratings yet
Aman_Babu_s_Resume__Copy_
1 page
db2 Training Class 001
No ratings yet
db2 Training Class 001
13 pages
Capstone Project 1
100% (1)
Capstone Project 1
20 pages
Uber Analysis Python Project in r
No ratings yet
Uber Analysis Python Project in r
29 pages
Week3 Cheat Sheet Exploratory Data Analysis
No ratings yet
Week3 Cheat Sheet Exploratory Data Analysis
3 pages
Rakesh Gupta DevBA
No ratings yet
Rakesh Gupta DevBA
2 pages
BUSO 758L: Data Analysis: Week 3: Visualization Using Tableau Homework Assignment Guide
No ratings yet
BUSO 758L: Data Analysis: Week 3: Visualization Using Tableau Homework Assignment Guide
20 pages
Sociology: Intermediate Quantitative Research Method
No ratings yet
Sociology: Intermediate Quantitative Research Method
26 pages
Bhaumik Dhameliya - RESUME - PC - Tech - Acara
No ratings yet
Bhaumik Dhameliya - RESUME - PC - Tech - Acara
2 pages
CS504Testing
No ratings yet
CS504Testing
24 pages
BPP Business School - Applied Modelling and Visualisation
No ratings yet
BPP Business School - Applied Modelling and Visualisation
19 pages
SN Travel Jupyter Notebook PDF
No ratings yet
SN Travel Jupyter Notebook PDF
28 pages
Unit 3
No ratings yet
Unit 3
37 pages
Output
No ratings yet
Output
24 pages
yulu-srk
No ratings yet
yulu-srk
20 pages
Ml-Exp-1 - Jupyter Notebook
No ratings yet
Ml-Exp-1 - Jupyter Notebook
8 pages
Regression Linaire Python Tome I
No ratings yet
Regression Linaire Python Tome I
9 pages
Data Science Lab Group Submission
No ratings yet
Data Science Lab Group Submission
13 pages
Data Science Lab Group Submission
No ratings yet
Data Science Lab Group Submission
13 pages
Business Analytics Unit - IV Notes_60637706_2025_05!15!02_16
No ratings yet
Business Analytics Unit - IV Notes_60637706_2025_05!15!02_16
28 pages
ML All Prints
No ratings yet
ML All Prints
25 pages
Algorithms
No ratings yet
Algorithms
2 pages
MML Chinmay
No ratings yet
MML Chinmay
10 pages
P1) Code Uber
No ratings yet
P1) Code Uber
6 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
System Design
No ratings yet
System Design
9 pages
Test Case For Notepad Test Cases New & Easy Process (2021)
No ratings yet
Test Case For Notepad Test Cases New & Easy Process (2021)
5 pages
Introduction To Python: Chris Piech and Mehran Sahami CS106A, Stanford University
No ratings yet
Introduction To Python: Chris Piech and Mehran Sahami CS106A, Stanford University
50 pages
R Programming
No ratings yet
R Programming
11 pages
DMDS mini project final
No ratings yet
DMDS mini project final
15 pages
1.5k Hotmail Mail Access Combo
No ratings yet
1.5k Hotmail Mail Access Combo
28 pages
Practical 1
No ratings yet
Practical 1
6 pages
Ncs 57d2 18dd Fixed Chassis Ds
No ratings yet
Ncs 57d2 18dd Fixed Chassis Ds
12 pages
business-case-yulu-hypothesis-testing.ipynb - Colab
No ratings yet
business-case-yulu-hypothesis-testing.ipynb - Colab
4 pages
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
No ratings yet
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
18 pages
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
No ratings yet
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
31 pages
AMBER18 & AMBERTOOLS18 Installation-Linux System
No ratings yet
AMBER18 & AMBERTOOLS18 Installation-Linux System
16 pages
On Student Support System: Roject Eport
No ratings yet
On Student Support System: Roject Eport
20 pages
Bike Sharing Analysis
No ratings yet
Bike Sharing Analysis
4 pages
VISHNUKANT Resume ProductManager
No ratings yet
VISHNUKANT Resume ProductManager
2 pages
PythonForMachineLearning
No ratings yet
PythonForMachineLearning
66 pages
Clustering Documentation R Code
100% (1)
Clustering Documentation R Code
9 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Week 9 Individual Assignment
No ratings yet
Week 9 Individual Assignment
3 pages
Solution - Data Analysis With Python-Project-2 - v1.0
No ratings yet
Solution - Data Analysis With Python-Project-2 - v1.0
14 pages
R Note
No ratings yet
R Note
56 pages
Diginame - in User Manual
No ratings yet
Diginame - in User Manual
20 pages
Machine Learning Statistical Model Using Transportation Data
No ratings yet
Machine Learning Statistical Model Using Transportation Data
32 pages
Uber Data Analysis: Data Import and Sanity Checks
No ratings yet
Uber Data Analysis: Data Import and Sanity Checks
16 pages
Bike Sharing Data Analysis
No ratings yet
Bike Sharing Data Analysis
24 pages
Intro To Data Coursera
No ratings yet
Intro To Data Coursera
9 pages
IS4240 - AY1314S2 - Assignment - DM1
No ratings yet
IS4240 - AY1314S2 - Assignment - DM1
3 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
React Portfolio App Development: Increase your online presence and create your personal brand
From Everand
React Portfolio App Development: Increase your online presence and create your personal brand
Abdelfattah Ragab
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Divvy Exercise R Script

Uploaded by

Divvy Exercise R Script

Uploaded by

library(tidyverse) #helps wrangle data

# Use the conflicted package to manage conflicts

# Set dplyr::filter and dplyr::lag as the default choices

(q1_2019 <- rename(q1_2019

# Inspect the dataframes and look for incongruencies

# There are a few problems we will need to fix:

# Check to make sure the proper number of observations were reassigned

# Add a "ride_length" calculation to all_trips (in seconds)

# Inspect the structure of the columns

# Remove "bad" data

# Compare members and casual users

# analyze ridership data by type and weekday

# Let's visualize the number of rides by rider type

# Let's create a visualization for average duration

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.