0% found this document useful (0 votes)

1 views1 page

Reshaping Data With TidyR in R

The document provides an overview of reshaping data in R using the tidyr package, highlighting various functions such as unnest, pivot_longer, and separate. It emphasizes the principles of tidy data, including that every column should represent a variable, every row an observation, and every cell a single value. Additionally, it includes examples of manipulating datasets, including movies and music data, to demonstrate the application of these functions.

Uploaded by

Marc Patrick Margallo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views1 page

Reshaping Data With TidyR in R

Uploaded by

Marc Patrick Margallo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Reshaping Data with tidyr in R

 
The fourth dataset is a synthetic dataset containing attributes of people. sex is a character vector, and # Expand nested data frame columns with unnest_longer()

hair_color is a factor. # Every top-level element of the nested data gets its own column in the resul t

# Vectors inside the nested data are given their own row

sex hair_color height_cm weight_kg music_unnested <- music %>%

female brown 166 72 unnest(singles)

# Roughly equivalent to music %>% unnest_longer(singles) %>% unnest_wider(singles)

male blonde 184

Learn R online at www.DataCamp.com female

male
black
black
153
192 93
artist
Bad Bunny
title
Gato de Noche
tracks
[[{“title”:”Gato de Noche”,”collaborator”:”Ñengo Flow”}]]
Bad Bunny La Jumpa

>
[[{“title”:”La Jumpa”,”collaborator”:”Arcángel”}]]

niting and separating columns

>
Drake Scary Hours 2 [[{“title”:”What’s Next”},{”title”:”Wants and Needs”,”collaborator”:”Lil Baby”},{"tit...

Content
U
z
# Summari e parts of a data frame as a list of dataframes with nest()

Definitions
# Combine several columns into a single vector column with unite()
music_unnested %>%

movies %>%
nest(singles = c(title, tracks))
unite(release_date, c(release_year, release_month, release_day), sep = "-")
 

The majority of data analysis in R is performed in data frames. These are rectangular datasets consisting of rows and artist singles
columns # Split a single vector column into several columns with separate()

Bad Bunny [[{“title”:”Gato de Noche”,”tracks”:[{“title”:”Gato de Noche”, “collaborator”:”Ñengo Flow”}]},{“title”:”La Jumpa”,”...

An observation contains all the values or variables related to a single instance of the objects being analyzed. For movies %>%

example, in a dataset of movies, each movie would be an observation. separate(directors, into = c("director1", "director2"), sep=",", fill = "right")
  Drake [[{“title”:”Scary Hours 2”,”tracks”:[{“title”:”What’s Next”},{“title”:”Wants and Needs”,”collaborator”:”Lil Baby”},{“...
A variable is an attribute for the object, across all the observations. For example, the release dates for all the movies

>
Tidy data provides a standard way to organize data. Having a consistent shape for datasets enables you to worry less # Split a single column into several rows with separate_rows()

about data structures and more on getting useful results. The principles of tidy data are
Every column is a variable
movies %>%

separate_rows(directors, sep=",")
Dealing with missing data
Every row is an observation
Every cell is a single value.
> P acking and unpacking columns # Drop
people %>%

rows containing any missing values in the specified columns with drop_na()

drop_na(weight_kg)

> Helpful syntax before getting started

# Combine several columns into a data frame column with pack()

movies_packed <- movies %>%

# Replace
people %>%

missing values with a default value with replace_na()

nstalling and loading tidyr

pack(release_date = c(release_year, release_month, release_day))

replace_na(list(weight_kg = 1 00))
# The release date column is a data frame with 5 rows, 3 column s
 
I

# Install tidyr through tidyverse

install.packages("tidyverse")
 
# Split a single data frame column into several columns with unpack()

movies_packed %>%

unpack(release_date)

/ /
# release_date column replaced with release_year release_month release_day columns
> Creating grids
# Install it directly
# Get all combinations of x
input values with e pand_grid()

expand_grid(

>
install.packages("tidyr")  

# Load tidyr into R

library(tidyr)
P ivoting sex = c("male", "female", "female")
hair_color = c("red", "brown", "blonde", "black", "red")

# 2 column data frame with rows like "male", "red" .

The %>% Operator # Move side-by-side columns to consecutive rows with pivot_longer()

popcorn_long <- popcorn %>%

: 6
pivot_longer(trial_1 trial_ , names_to = "trial", values_to = "n_unpopped")

# Get all combinations of input values, deduplicating and sorting with crossing()

%>% is a special operator in R found in the magrittr and tidyr packages. %>% lets you pass objects to functions elegantly, # "brand" columns contains "Orville" "

and "Seaway
crossing (

and helps you make your code more readable. The following two lines of code are equivalent.

# "trial" column contains "trial_1" to "trial_6"

se x = c("male", "female", "female") ,

# "n_unpopped" column contains the numbers

 
hair_color = c("red", "brown", "blonde", "black", "red")

# Without the %>% operator

second_function(first_function(dataset, arg1, arg2), arg3)

# Move values in different rows to columns with pivot_wider()

x
# Same as e pand_grid() but "red" rows only appear once and order is alphabetica l
 

popcorn_long %>%

# With the %>% operator

pivot_wider(brand, names_from = "trial", values_from = "n_unpopped")

dataset %>% some_function(arg1, arg2) %>% second_function(arg3) # Same contents and shape as popcorn dataset
# Get all combinations of values in data frame columns with e pand()
x
All ' data

>
# factor levels included, even if they don t appear in

> Datasets used throughout this cheat sheet Nesting and unnesting
people %>%

x x
e pand(se , hair_color)

# Equivalent x q $ x
to e pand_grid(uni ue(people se ), levels(people hair_color) $ )
 

Throughout this cheat sheet we will use a dataset of the top grossing movies of all time, stored as movies. # Expand nested data frame columns with unnest_longer()

# Vectors inside the nested data are given their own row
# Get x
all combinations of values that e ist in data frame columns with e pand() x + nesting()

title release_year release_month release_day directors box_office_busd # The number of columns remains unchanged
people %>%

Avatar 2009 12 18 James Cameron 2.922 music %>%

x x
e pand(nesting(se , hair_color))

unnest_longer(singles) # As previous, but filtered to rows that e ist in people datase x t

Avengers: 2019 4 22 Anthony Russo,

2.798
Endgame Joe Russo
Titanic 1997 11 01 James Cameron 2.202 artist single$title singles$tracks
# Expand the data frame, then full join to itself with complete()

Star Wars Ep. 2015 12 14 J.J Abrams 2.068 Bad Bunny Gato de Noche 2 Variables
VII: The Force people %>%

Awakens Bad Bunny La Jumpa 2 Variables x

complete(se , hair_color)

Avengers: 2018 4 23 Anthony Russo,

2.048 Drake Scary Hours 2 1 Variable x
# Same output as e pand, with additional height_cm and weight_kg column s
 

Infinity War Joe Russo

Expand nested data
The second dataset involves an experiment with the number of unpopped kernels in bags of popcorn, adapted from the #
Top-level elements
frame columns with unnest_wider()

n
# Fill q
in se uence of numeric or datetime columns with e pand() x + full_se ()
q
Popcorn dataset in the Stat2Data package. #
# The number of rows
inside the nested data are given their own colum
remains unchange d
people %>%

x x
e pand(height_cm_e panded = full_se (height_cm, 1))
q
brand trial_1 trial_2 trial_3 trial_4 trial_5 trial_6 music %>%

# 1 column data frame with height_cm_e panded value x s

unnest_wider(singles)
Orville 26 35 18 14 8 6 # from min height_cm to ma x height_cm in steps of 1

Seaway 47 47 14 34 21 37 artist title tracks

The third dataset is JSON data about music containing nested elements. The JSON is parsed into nested lists using Bad Bunny [[“Gato de Noche”,”La Jumpa”]] [[[{“title”:”Gato de Noche”,”collaborator”:”Ñengo Flow”}],[{“title”:”...
parse_json() from the jsonlite package. Drake [“Scary Hours 2”] [[[{“title”:”What’s Next”},{”title”:”wants and needs”,”collaborator”...

artist singles
Bad Bunny Title Tracks
# Expand selected
Replacement for
nested data frame columns with hoist()

Learn R Online at
www.DataCamp.com
# unnest_wider() %>% select()

Gato de Noche Gato de Noche, Ñengo Flow music %>%

hoist(singles, single_titles = "title")

La Jumpa La Jumpa, Arcángel
Drake Title Tracks artist single_titles singles
Scary Hours 2 What's Next, Wants and
Needs, Lemon Pepper Bad Bunny [[“Gato de Noche”,”La Jumpa”]] [[{“tracks”:[{title”:”Gato de Noche”,”collaborator”:”Ñengo Flow”}...
Freestyle, NA, Lil
Baby, Rick Ross Drake [“Scary Hours 2”] [[{“tracks”:[{”title”:”What’s Next”},{”title”:”Wants and Needs”,”coll...

Data Wrangling
No ratings yet
Data Wrangling
15 pages
R Programming Cheatsheet
100% (2)
R Programming Cheatsheet
6 pages
Tidy Verse
No ratings yet
Tidy Verse
76 pages
Manipulating Data in R
No ratings yet
Manipulating Data in R
32 pages
Week5 Slides
No ratings yet
Week5 Slides
72 pages
Manipulating Data in R
No ratings yet
Manipulating Data in R
57 pages
Tidyverse - Tidyr and Dplyr
No ratings yet
Tidyverse - Tidyr and Dplyr
33 pages
Unit2
No ratings yet
Unit2
76 pages
Data Visualization Notes-2
No ratings yet
Data Visualization Notes-2
223 pages
Week6 Slides Updated
No ratings yet
Week6 Slides Updated
57 pages
Lab 1 (with Answers)
No ratings yet
Lab 1 (with Answers)
44 pages
Data Science Unit 2 Second Half Notes[1]
No ratings yet
Data Science Unit 2 Second Half Notes[1]
18 pages
4.18 Data Wrangling Slides Part1
No ratings yet
4.18 Data Wrangling Slides Part1
54 pages
Advanced R Programming Tidyverse Packages Notes
No ratings yet
Advanced R Programming Tidyverse Packages Notes
12 pages
Data - Table Tutorial (With 50 Examples) PDF
No ratings yet
Data - Table Tutorial (With 50 Examples) PDF
13 pages
Practical Preprocessing and Data Cleaning
No ratings yet
Practical Preprocessing and Data Cleaning
51 pages
BIO259 Note
No ratings yet
BIO259 Note
55 pages
BS730 Class 12
No ratings yet
BS730 Class 12
36 pages
R Programming Cont..
No ratings yet
R Programming Cont..
24 pages
R Course Own English HS
No ratings yet
R Course Own English HS
70 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
Tutorial-Introduction To Dplyr
No ratings yet
Tutorial-Introduction To Dplyr
54 pages
Data Manipulation Workshop Handout
No ratings yet
Data Manipulation Workshop Handout
46 pages
Data Analytics Lesson 10 Notes
No ratings yet
Data Analytics Lesson 10 Notes
7 pages
Group Manipulation and Data Reshaping in R
No ratings yet
Group Manipulation and Data Reshaping in R
10 pages
fonction dplyr
No ratings yet
fonction dplyr
5 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Data Table
No ratings yet
Data Table
2 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
Data Wrangling Cheatsheet PDF
No ratings yet
Data Wrangling Cheatsheet PDF
2 pages
2 Manipulating Processing Data
No ratings yet
2 Manipulating Processing Data
81 pages
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
No ratings yet
Code Basics & Data Manipulation With R: Literature: Wickham & Grolemund R For Data Science Ch. 3, 16
31 pages
Dplyr Cheatsheet PDF
100% (1)
Dplyr Cheatsheet PDF
2 pages
How To Merge Tables Using R
No ratings yet
How To Merge Tables Using R
5 pages
Practical Assignment-10 Mini Project Nutrition Calculator - Calculate Nutrition For Recipes
No ratings yet
Practical Assignment-10 Mini Project Nutrition Calculator - Calculate Nutrition For Recipes
16 pages
Data Wrangling Cheatsheet PDF
No ratings yet
Data Wrangling Cheatsheet PDF
2 pages
MIT 302 - Statistical Computing II - Tutorial 02
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 02
5 pages
R Dplyr Tutorial - Merge, Join, Spread PDF
No ratings yet
R Dplyr Tutorial - Merge, Join, Spread PDF
17 pages
DataCamp Week 5
No ratings yet
DataCamp Week 5
7 pages
Cleaning Data
No ratings yet
Cleaning Data
17 pages
Study Guide Data Manipulation With R
No ratings yet
Study Guide Data Manipulation With R
4 pages
Dba Midterm Cheatsheet
No ratings yet
Dba Midterm Cheatsheet
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
BMR Assignment: Tidyr
No ratings yet
BMR Assignment: Tidyr
3 pages
Reshaping Data With Python
No ratings yet
Reshaping Data With Python
1 page
Fusion Introduction
No ratings yet
Fusion Introduction
57 pages
Data Tidying With Tidyr::: Cheat Sheet
No ratings yet
Data Tidying With Tidyr::: Cheat Sheet
2 pages
R Programming Cheat Sheet: Ata Tructures
No ratings yet
R Programming Cheat Sheet: Ata Tructures
2 pages
Marantz Cinema 70s Owners Manual (En)
No ratings yet
Marantz Cinema 70s Owners Manual (En)
290 pages
(Module 2) Data Visualization in Excel
100% (1)
(Module 2) Data Visualization in Excel
72 pages
Seminar Report On IOT
No ratings yet
Seminar Report On IOT
40 pages
Ee Abbreviations H BK Final V 1
No ratings yet
Ee Abbreviations H BK Final V 1
50 pages
Thomson WHP-6005BT Headphone
No ratings yet
Thomson WHP-6005BT Headphone
72 pages
English Sessions
No ratings yet
English Sessions
30 pages
Ir 35
No ratings yet
Ir 35
68 pages
Office File Management System
No ratings yet
Office File Management System
44 pages
A Novel Approach For Data Driven Security in Intelligent Healthcare System
No ratings yet
A Novel Approach For Data Driven Security in Intelligent Healthcare System
11 pages
Editing in MS Word
No ratings yet
Editing in MS Word
3 pages
Demartek Storage Networking Interface Comparision
No ratings yet
Demartek Storage Networking Interface Comparision
19 pages
RAS6-TECHNICAL-DATA-SHEET-A4-V1.1
No ratings yet
RAS6-TECHNICAL-DATA-SHEET-A4-V1.1
3 pages
IIT Programming Assignment Semester 2 - 2019 (1) 1
No ratings yet
IIT Programming Assignment Semester 2 - 2019 (1) 1
12 pages
LO11 T3W10 Worksheet MEMO
No ratings yet
LO11 T3W10 Worksheet MEMO
3 pages
R2D2 Best Practices - Individual Access To R2D2
No ratings yet
R2D2 Best Practices - Individual Access To R2D2
22 pages
Nike Digital Campus Challenge Marketing Plan
No ratings yet
Nike Digital Campus Challenge Marketing Plan
17 pages
Stellarium
100% (1)
Stellarium
6 pages
Full Detailed Report: (ISO/IEC 24734 Office Category Test)
No ratings yet
Full Detailed Report: (ISO/IEC 24734 Office Category Test)
1 page
What Sapp Tools
No ratings yet
What Sapp Tools
6 pages
Nursing Science Information Science Computer Science: What Is Informatics?
No ratings yet
Nursing Science Information Science Computer Science: What Is Informatics?
16 pages
DaguCar Update
No ratings yet
DaguCar Update
5 pages
Figure Above: Biological Inspiration of Genetic Algorithm
No ratings yet
Figure Above: Biological Inspiration of Genetic Algorithm
6 pages
Amrita QP - IP Class XI - First Term 2018-2019
No ratings yet
Amrita QP - IP Class XI - First Term 2018-2019
2 pages
English Behavior Expectations
No ratings yet
English Behavior Expectations
6 pages
Good Documentation and Quality Management Principles
No ratings yet
Good Documentation and Quality Management Principles
11 pages
Create Your Future: Linear Motor Drive / High-Speed & High-Precision Die-Sinker EDM
No ratings yet
Create Your Future: Linear Motor Drive / High-Speed & High-Precision Die-Sinker EDM
2 pages
MTH 127 PDF
No ratings yet
MTH 127 PDF
4 pages
Insignia 2.0 A20NTH 2014 Automatic Transmision
No ratings yet
Insignia 2.0 A20NTH 2014 Automatic Transmision
2 pages
Wa0073.
No ratings yet
Wa0073.
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Reshaping Data With TidyR in R

Uploaded by

Reshaping Data With TidyR in R

Uploaded by

Reshaping Data with tidyr in R

sex hair_color height_cm weight_kg music_unnested <- music %>%

female brown 166 72 unnest(singles)

# Roughly equivalent to music %>% unnest_longer(singles) %>% unnest_wider(singles)

Learn R online at www.DataCamp.com female

niting and separating columns

Bad Bunny [[{“title”:”Gato de Noche”,”tracks”:[{“title”:”Gato de Noche”, “collaborator”:”Ñengo Flow”}]},{“title”:”La Jumpa”,”...

> Helpful syntax before getting started

movies_packed <- movies %>%

missing values with a default value with replace_na()

nstalling and loading tidyr

# Install tidyr through tidyverse

# Load tidyr into R

# 2 column data frame with rows like "male", "red" .

popcorn_long <- popcorn %>%

# "trial" column contains "trial_1" to "trial_6"

se x = c("male", "female", "female") ,

# "n_unpopped" column contains the numbers

# Without the %>% operator

second_function(first_function(dataset, arg1, arg2), arg3)

# Move values in different rows to columns with pivot_wider()

# With the %>% operator

Avatar 2009 12 18 James Cameron 2.922 music %>%

unnest_longer(singles) # As previous, but filtered to rows that e ist in people datase x t

Avengers: 2019 4 22 Anthony Russo,

Awakens Bad Bunny La Jumpa 2 Variables x

Avengers: 2018 4 23 Anthony Russo,

Infinity War Joe Russo

# 1 column data frame with height_cm_e panded value x s

Seaway 47 47 14 34 21 37 artist title tracks

Gato de Noche Gato de Noche, Ñengo Flow music %>%

hoist(singles, single_titles = "title")

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.