R Basic and Advanced
R Basic and Advanced
R Basic and Advanced
The Tidyverse
includes several popular R packages, such as:
dplyr: for data manipulation and analysis
Lubridate is an R package that provides a set of functions for working with dates
and Tidyverse: This is a collection of packages that provide a consistent and intuitive
way of working with data in R. The core packages in the tidyverse are:
Tidyverse: This is a collection of packages that provide a consistent and intuitive way of
working with data in R. The core packages in the tidyverse are:
dplyr: For data manipulation and filtering
tidyr: For data transformation and reshaping
ggplot2: For data visualization
readr: For reading and parsing data files
lubridate: For working with dates and times
dplyr:
filter(): For filtering data
select(): For selecting specific columns
mutate(): For creating new columns
group_by(): For grouping data
summarise() : For summarizing data
glimpse is a function from the dplyr package, which is part of the tidyverse. It provides a
concise summary of a data frame, similar to str() or summary()
tidyr:
pivot_longer(): For converting data from wide to long format
pivot_wider() : For converting data from long to wide format
drop_na(): For removing missing values
lubridate:
year(): For extracting the year from a date
month(): For extracting the month from a date
day(): For extracting the day from a date
ggplot2:
ggplot(): For creating visualizations
aes(): For mapping variables to visual properties
geom_point(): For creating scatter plots
geom_bar(): For creating bar charts
nycflights13 Package? flights: all flights that departed from NYC in 2013
weather: hourly meterological data for each airport
planes: construction information about each plane
airports: airport names and locations
airlines: translation between two-letter carrier codes and na
Functions :
filter(): Select specific rows or columns based on conditions.
Mutate:
ymd(): parses a character string into a Date object
mdy(): parses a character string into a Date object (month-day-year
format)
dmy(): parses a character string into a Date object (day-month-year
format)
interval(): creates an interval object representing a specific time
span
duration(): creates a duration object representing a specific length
of time
period(): creates a period object representing a specific length of
time
glimpse is a function from the dplyr package, which is part of the tidyverse. It provides a
concise summary of a data frame, similar to str() or summary()
Out put sort : [1] 1990 1990 1991 1991 1992 1992
Output of arrange:
21 1990 10 123
32 1990 10 123
43 1991 10 123
54 1991 10 123
65 1992 10 123
76 1992 10 123
sum(): This function calculates the sum of a numeric vector. It's not
suitable for counting the number of characters in a string, as you've
noticed.
length(): This function returns the number of elements in a vector,
including strings. However, it doesn't count the number of characters
within a string.
nchar(): This function returns the number of characters in a string.
It's what you need to count the number of characters in a string, like
the name column in your example.
nzchar()
strsplit()
The syntax is x %in% y, where x is the vector or column you want to check, and y is the vector
or column you want to check against. %in% can be used with both columns and rows,
depending on the context
)
# Print the table
Table
table
If you have data in a file (e.g., CSV, Excel, or text file), you can read it into R using various functions
suc # Read in a CSV file
List :
Vector:
Matrix:
Dataframe:
Create data frame:
df <- data.frame(
column1 = c(values),
column2 = c(values),
...
)
Functionns on dataframe
str(df)- see in console output
sample_n() function-> from the dplyr package. This function allows you to take a
random sample of rows from a dataframe.
Or
Alternatively, you can use the sample() function to take a random sample of
rows. Here's an example:
# Take a random sample of 3 rows
2Random_subset <- df[sample(nrow(df), 3), ]
4# Print the random subset
5print(Random_subset)
# Create a dataframe
3df <- data.frame(name = c("Welcome", "to", "Geeks", "for", "Geeks"),
4 year = c(10, 51, 19, 126, 99),
5 length = c(40, NA, NA, 100, 95),
6 education = c("yes", "yes", "no", "no", "yes"))
7
8# Take a random sample of 3 rows
9Random_subset <- df %>% sample_n(3)
10
11# Print the random subset
12print(Random_subset)
Column:
names(df) or str(df) to see the column names
access specific columns in a dataframe using the $ operator or the [[ ]] :
df$year or df[["year"]]
2 group_by(name) %>%
3 filter(n() > 1)
ROW :
**access the first row of the df dataframe df[1,] access first column df[,1]
------------------------------------------------------------------------------------------------------------
Deleting a row : function slice from the dplyr : The slice function takes a
dataframe and a vector of row indices as arguments. df <- df %>% slice(-1) all rows except first
row
Or : Alternatively, you can use the [- operator to remove the first row, like df <- df[-1,]
Changing a row :
To change a row in a dataframe, you can use the [ operator to access the row and
assign new values to it.For example, to change the first row of the df dataframe, you can
use:
1. rowwise():
This is a function from the dplyr package that groups the dataframe by rows.
When you use rowwise(), each row of the dataframe is treated as a separate group.
This is similar to using apply(df, 1, ...) in base R, but rowwise() is more concise
and efficient.
rowwise() and mutate() functions, the code applies the sum() function to each row of the
dataframe, and the result is added as a new column sum to the dataframe.
library(dplyr)
2df %>%
3 group_by(year, sex, name, n, prop) %>%
4 filter(n() > 1)
operator.Class:
Seq:
Function Armethic:
Operators: