R Basic and Advanced

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Libraries:

The Tidyverse
includes several popular R packages, such as:
dplyr: for data manipulation and analysis

ggplot2: for data visualization

tidyr: for data transformation and reshaping

readr and writexl: for data import and export

purrr: for functional programming and data manipulation

stringr: for string manipulation

forcats: for categorical data manipulation

The MDSR Library :


Explore the MDSR datasets, using the data() function
Use the MDSR functions, such as mdsr_clean() and mdsr_visualize()
Take advantage of the MDSR utilities, such as mdsr_import() and mdsr_export()
Work through the MDSR course and book series, using the library to support your
learning

Lubridate is an R package that provides a set of functions for working with dates
and Tidyverse: This is a collection of packages that provide a consistent and intuitive
way of working with data in R. The core packages in the tidyverse are:

 Tidyverse: This is a collection of packages that provide a consistent and intuitive way of
working with data in R. The core packages in the tidyverse are:
 dplyr: For data manipulation and filtering
 tidyr: For data transformation and reshaping
 ggplot2: For data visualization
 readr: For reading and parsing data files
 lubridate: For working with dates and times

Other Important Packages:

 stringr: For string manipulation and text analysis


 magrittr: For piping operations together
 pacman: For package management and installation

Key Functions to Know:

 dplyr:
 filter(): For filtering data
 select(): For selecting specific columns
 mutate(): For creating new columns
 group_by(): For grouping data
 summarise() : For summarizing data
 glimpse is a function from the dplyr package, which is part of the tidyverse. It provides a
concise summary of a data frame, similar to str() or summary()
 tidyr:
 pivot_longer(): For converting data from wide to long format
 pivot_wider() : For converting data from long to wide format
 drop_na(): For removing missing values
 lubridate:
 year(): For extracting the year from a date
 month(): For extracting the month from a date
 day(): For extracting the day from a date
 ggplot2:
 ggplot(): For creating visualizations
 aes(): For mapping variables to visual properties
 geom_point(): For creating scatter plots
 geom_bar(): For creating bar charts
 nycflights13 Package? flights: all flights that departed from NYC in 2013
 weather: hourly meterological data for each airport
 planes: construction information about each plane
 airports: airport names and locations
 airlines: translation between two-letter carrier codes and na
Functions :
filter(): Select specific rows or columns based on conditions.

arrange(): Sort data in ascending or descending order.


group_by(): Divide data into groups based on one or more variables.
summarise(): Calculate summary statistics for each group.
mutate(): Add new columns to the data.

select(): Select specific columns from the data.

 select is for selecting columns


 filter is for selecting rows based on conditions
note that in R, you need to use the & operator to combine multiple conditions,
rather than chaining them together with < and >

Mutate:
ymd(): parses a character string into a Date object
mdy(): parses a character string into a Date object (month-day-year
format)
dmy(): parses a character string into a Date object (day-month-year
format)
interval(): creates an interval object representing a specific time
span
duration(): creates a duration object representing a specific length
of time
period(): creates a period object representing a specific length of
time

inner_join():inner_join(table1, table2, by = "id")


left_join()
full_join()
nrow(flights)

glimpse is a function from the dplyr package, which is part of the tidyverse. It provides a
concise summary of a data frame, similar to str() or summary()

Other methods and functions:


class(A),str(A) Finding the Type of Output
head(A)// summary(A)//glimpse ()
package_name::function_name
?function_name or help(function_name)
As.integer or ……..
Tribble // make table
Paste() for concate: paste is a function that concatenates strings or vectors of strings
into a single string.
n_distinct
sum(!is.na(name))
sorted()
*** important difference between sort and arrange :
As you can see, the arrange function returns a new data frame with the rows sorted in
ascending order by yearID. The output is a data frame with the same structure as the original
data frame, but with the rows rearranged according to the sorting criteria.
As you can see, the sort function returns a sorted vector, not a data frame. The output is a
single vector with the sorted values of the yearID column.
df <- data.frame(yearID = c(1992, 1990, 1991, 1992, 1990, 1991),
2 teamID = c(10, 10, 10, 10, 10, 10),
3 playerID = c(123, 123, 123, 123, 123, 123))
4
5sorted_df <- sort(df, by = "yearID")
6 sorted_df <- df %>% arrange(yearID)
7sorted_df

Out put sort : [1] 1990 1990 1991 1991 1992 1992

Output of arrange:

yearID teamID playerID

21 1990 10 123
32 1990 10 123
43 1991 10 123
54 1991 10 123
65 1992 10 123
76 1992 10 123
sum(): This function calculates the sum of a numeric vector. It's not
suitable for counting the number of characters in a string, as you've
noticed.
length(): This function returns the number of elements in a vector,
including strings. However, it doesn't count the number of characters
within a string.
nchar(): This function returns the number of characters in a string.
It's what you need to count the number of characters in a string, like
the name column in your example.
nzchar()

strsplit()

The syntax is x %in% y, where x is the vector or column you want to check, and y is the vector
or column you want to check against. %in% can be used with both columns and rows,
depending on the context

Details of making Table:


1: Using the data.frame function:
# Create a table with 3 columns and 4 rows

table <- data.frame(

Name = c("John", "Mary", "David", "Emily"),

Age = c(25, 31, 42, 28),

Country = c("USA", "Canada", "UK", "Australia")

)
# Print the table

Table

Method 2: Using the tibble function

# Create a table with 3 columns and 4 rows


library(tibble)
table <- tibble(
Name = c("John", "Mary", "David", "Emily"),
Age = c(25, 31, 42, 28),
Country = c("USA", "Canada", "UK", "Australia")
)
# Print the table
Table

Method 3: Using a matrix

# Create a matrix with 3 columns and 4 rows

matrix <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), nrow = 4, ncol = 3)

# Convert the matrix to a table

table <- as.data.frame(matrix)

# Print the table

table

Method 4|: Reading in data from a file

If you have data in a file (e.g., CSV, Excel, or text file), you can read it into R using various functions
suc # Read in a CSV file

2table <- read.csv("data.csv")

4# Print the table

5tableh as read.csv, read.table, or read_excel. Here's an example


Conditional operators :

Loops and if condition:

List :

Vector:

Matrix:

Dataframe:
Create data frame:
df <- data.frame(
column1 = c(values),
column2 = c(values),
...
)
Functionns on dataframe
str(df)- see in console output

print(df) see output in console like a table and tidy

sample_n() function-> from the dplyr package. This function allows you to take a
random sample of rows from a dataframe.
Or
Alternatively, you can use the sample() function to take a random sample of
rows. Here's an example:
# Take a random sample of 3 rows
2Random_subset <- df[sample(nrow(df), 3), ]
4# Print the random subset
5print(Random_subset)
# Create a dataframe
3df <- data.frame(name = c("Welcome", "to", "Geeks", "for", "Geeks"),
4 year = c(10, 51, 19, 126, 99),
5 length = c(40, NA, NA, 100, 95),
6 education = c("yes", "yes", "no", "no", "yes"))
7
8# Take a random sample of 3 rows
9Random_subset <- df %>% sample_n(3)
10
11# Print the random subset
12print(Random_subset)

Column:
names(df) or str(df) to see the column names
access specific columns in a dataframe using the $ operator or the [[ ]] :
df$year or df[["year"]]

Removing a column: df %>% select(-year) or df$year <- NULL


Converting a column to a string: df$year <- as.character(df$year)

Applying a function to a specific column : To apply a function to a specific


column, you can use the mutate function from the dplyr package.

df %>% mutate(n_sqrt = sqrt(n)) -----


mutate(prop = prop * 10000000/1000000) -- result in column no create new column
or you can use you can use the $ operator to access the column and apply the function
directly, like this: df$n <- sqrt(df$n)

Finding columns with NaN values: sapply(df, function(x) any(is.nan(x)))

Finding duplicate values in a column: df %>%

2 group_by(name) %>%
3 filter(n() > 1)

ROW :
**access the first row of the df dataframe df[1,] access first column df[,1]

------------------------------------------------------------------------------------------------------------

Adding a row : r bind


new_row <- c(2022, "M", "John", 50, 0.0001)

2df <- rbind(df, new_row)

Deleting a row : function slice from the dplyr : The slice function takes a
dataframe and a vector of row indices as arguments. df <- df %>% slice(-1) all rows except first
row

Or : Alternatively, you can use the [- operator to remove the first row, like df <- df[-1,]

Changing a row :
To change a row in a dataframe, you can use the [ operator to access the row and
assign new values to it.For example, to change the first row of the df dataframe, you can
use:

df[1,] <- c(2022, "M", "John", 50, 0.0001)

Applying a function to a specific row :


you can use the rowwise function from the dplyr package. The rowwise function allows
you to apply a function to each row of a dataframe.

df %>% rowwise() %>% mutate(sum = sum(n, prop))

1. rowwise():
 This is a function from the dplyr package that groups the dataframe by rows.
 When you use rowwise(), each row of the dataframe is treated as a separate group.
 This is similar to using apply(df, 1, ...) in base R, but rowwise() is more concise
and efficient.
 rowwise() and mutate() functions, the code applies the sum() function to each row of the
dataframe, and the result is added as a new column sum to the dataframe.

df$sum <- apply(df[,c("n", "prop")], 1, sum) :


 The apply function takes three arguments:
 The first argument is the dataframe or matrix that we want to apply the function to. In
this case, it's df[,c("n", "prop")].
 The second argument is the MARGIN argument, which specifies whether we want to
apply the function to rows (1) or columns (2). In this case, we're using 1, which means
we want to apply the function to each row.
 The third argument is the function that we want to apply. In this case, it's
the sum function.
Finding rows with NaN values: df[is.nan(df$prop), ] This will return all rows
where the prop column has NaN values.

Finding duplicate rows: df[duplicated(df) | duplicated(df, fromLast = TRUE), ] To find


duplicate rows, we can use the duplicated() function:

Alternatively, we can use the group_by() and filter() functions from


the dplyr package:

library(dplyr)
2df %>%
3 group_by(year, sex, name, n, prop) %>%
4 filter(n() > 1)

operator.Class:

Seq:

Function Armethic:

Operators:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy