Sam BRM Rstudio
Sam BRM Rstudio
Sam BRM Rstudio
METHODOLOGY
PRACTICAL FILE
PRESENTED BY
Samridhi
03019301722
Introduction To R
▶ The "R" name is derived from the first letter of the names of its two developers, Ross Ihaka and
Robert Gentleman, who were associated with the University of Auckland at the time.
▶ The initial version of R was released in 1995 to allow academic statisticians and others with
sophisticated programming skills to perform complex data statistical analysis and
display the results in form of a multitude of visual graphics.
▶ Languages such as C++ require that an entire section of the code be written, compiled, and
run to see results, but in the case of r results can be seen after one command at a time.
R Studio:
▶ R is one of the programming languages that provide an intensive environment for you to
research, process, transform, and visualize information.
▶ Many users of the R programming language like the fact that it is free to download, offers
sophisticated data analytics capabilities, and has an active community of users online where
they can turn to for support.
1. The Lower left pane is the R console, with is used to write the commands and functions (it is
the place where R is waiting for you to tell it what to do, and where it will show the results of
a command).
2. The upper left pane takes the place of a text editor but is far more powerful than.
3. The upper right pane holds information about the workspace, command history, and files in
the current folder.
4. The lower right pane displays plots, packages, information, and help files.
R Commands:
1. The console pane in RStudio is the place where commands written in the R language can
be typed and executed immediately by the computer.
2. It is also where the results will be shown for commands that have been executed. You
can type commands directly into the console and press Enter to execute those
commands, but they will be forgotten when you close the session.
3. If R is ready to accept commands, the R console by default shows a > prompt.
4. If it receives a command (by typing, copy-pasting, or sending from the script editor using
Ctrl + Enter), R will try to execute it, and when ready, will show the results and come back
with a new > prompt to wait for new commands.
5. If R is still waiting for you to enter more data because it isn’t complete yet, the console will
show a + prompt. It means that you haven’t finished entering a complete command.
Types of Data in R-studio:
▶ According to r-studio the four main types of data most likely to be used are
1. Numeric data,
R Operators:
▶ Arithmetic operators: The R arithmetic operators allow us to do math operations, like sums,
divisions, or multiplications, among others.
▶ Assignment operators in R: The assignment operators in R allow you to assign data to a named
object in order to store the data like (<-)Left assignment, (->)Right assignment.
CTRL + L
Variable
Variables can be thought of as a labelled container used to store information. Variables allow us
to recall saved information to later use in calculations. Variables can store many different things
in R studio, from single values to tables of information, images and graphs.
Storing a value or “assigning” it to a variable is completed using either <-, = or -> function. The
name given to a variable should describe the information or data being stored. This helps when
revisiting old code or when sharing it with others.
> num1=2
> name="sumit"
> feepaid=TRUE
> num1=2
> num1
[1] 2
> num2<-4
> num2
[1] 4
> 7->num3
> num3
[1] 7
Arithmetic Operators
Addition
> 2+1
[1] 3
> 2+6
[1] 8
Subtraction
> 2-4
[1] -2
Multiplication
> 2*9
[1] 18
Division
> 2/7
[1] 0.28571428
EXPONENT/POWER
> 2^5
[1] 32
> num1=15
> num2=5
> num1+num2
[1] 20
> num1-num2
[1] 10
> num1*num2
[1] 75
> num1/num2
[1] 3
Use of Relational Operators
num1=10
> num2=4
> num1>num2
[1] TRUE
> num1<num2
[1] FALSE
> num1==num2
[1] FALSE
> num1!=num2
[1] TRUE
> num1>=num2
[1] TRUE
> num1<=num2
[1] FALSE
> num1=9
> num2=4
[1] TRUE
[1] FALSE
[1] FALSE
> num1=20
> num2=10
[1] TRUE
[1] TRUE
[1] TRUE
[1] FALSE
Vectors in R studio:
In R, a sequence of elements that share the same data type is known as a vector. If we use only one
item like 2 then it is variable but if a number of items are calculated collectively it is called a vector.
> vec1=c(6,5,4,3,2,1)
> vec1
[1] 6,5,4,3,2,1
> class(vec1)
[1] "numeric"
> vec2=c("a","b","c","d","e")
> vec2
> class(vec2)
[1] "character"
> vec3=c(F,T,F,T)
> vec3
> class(vec3)
[1] "logical"
> vec4=c(1,"a",4,"b",3)
> vec4
> class(vec4)
[1] "character"
> vec5=c(1,T,2,F,3,F)
> vec5
[1] 1 1 2 0 3 0
> class(vec5)
[1] "numeric"
> vec6=c("d",T,"b",F,"c")
> vec6
> class(vec6)
[1] "character"
> vec7=c(1,"a",T,2,"e",F)
> vec7
> class(vec7)
[1] "character"
Vector Arithmetic:
> vec1=c(1,2,3,4,5,6)
> vec2=c(1,1,1,1,1,1)
> vec1+vec2
[1] 2 3 4 5 6 7
> vec1-vec2
[1] 0 1 2 3 4 5
> vec1*vec2
[1] 1 2 3 4 5 6
[1] 1 2 3 4 5 6
Vector indexing:
Now try finding V1[4] or V2[3] etc. to find the value at 4 or 3 items. Also, try finding the length and
class of V1 and V2.
1. Write V1[4] which means extracting the 4th element from the V1 array.
2. Second to know the number of items in an array write Length(V1) – R will show you the total
number of items in an array.
3. To know the class of an array use the command Class(V1) it will show you the class
whether it is numeric, character, logical, etc.
> vec1=c(5,16,33,23,24,40,35,17)
> vec1[2]
[1] 16
> length(vec1)
[1] 8
Lists in R studio:
Lists are the R objects which contain elements of different types like − numbers, strings, vectors,
and other lists inside them. The list is created using list() function.
> l1=list(1,"a",TRUE)
> l1
[[1]]
[1] 1
[[2]]
[1] "a"
[[3]]
[1] TRUE
> class(l1[[1]])
[1] "numeric"
> class(l1[[2]])
[1] "character"
> class(l1[[3]])
[1] "logical"
List of Vectors
> l2=list(c(1,2,3),c("a","b","c"),c(T,F,T))
> l2
[[2]]
[[3]]
> l2[[2]][1]
[1] "a"
> l2[[1]][3]
[1] 3
Matrices in R studio:
In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created with the help
of the vector input to the matrix function. On R matrices, we can perform addition, subtraction,
multiplication, and division operation.
In the R matrix, elements are arranged in a fixed number of rows and columns. The matrix elements are
the real numbers.
> m1=matrix(c(6,5,4,3,2,1))
> m1
[,1]
[1,] 6
[2,] 5
[3,] 4
[4,] 3
[5,] 2
[6,] 1
> m1=matrix(c(1,2,3,4,5,6),nrow=2,ncol=3)
> m1
[1,] 1 3 5
[2,] 2 4 6
> m1=matrix(c(1,2,3,4,5,6),nrow=2,ncol=3,byrow=T)
> m1
[1,] 1 2 3
[2,] 4 5 6
> m1[1,2]
[1] 2
Array in R studio:
An array is a data structure that can hold multi-dimensional data. In R, the array objects can hold
two or more two-dimensional data. Arrays are also called vector structures. A vector is an array of
numbers with a single index while a matrix is an array of numbers with two indices.
▶ Uni-dimensional arrays are called vectors with the length being their only dimension.
▶ Two-dimensional arrays are called matrices, consisting of fixed numbers of rows and
columns.
> vec1=c(1,2,3,4,5,6)
> vec2=c(7,8,9,10,11,12)
> a1=array(c(vec1,vec2),dim=c(2,3,2))
> a1
,,1
[1,] 1 3 5
[2,] 2 4 6
,,2
[1,] 7 9 11
[2,] 8 10 12
> a1[1,2,1]
[1] 3
> a1[1,1,2]
[1] 7
> a1[2,3,2]
[1] 12
Data Frame
>fruits=data.frame(fruit_name=c("apple","banana","mango"),fruit_cost=c(100,200,300))
> fruits
fruit_name fruit_cost
1 apple 100
2 banana 200
3 mango 300
> fruits$fruit_cost
> fruits$fruit_name
▶ A function is a set of statements organized together to perform a specific task. R has a large
number of in-built functions and the user can create their own functions.
▶ We can create user-defined functions in R. They are specific to what a user wants and once
created they can be used like the built-in functions.
> add5=function(x)
print(x*10)
> add5(5)
[1] 50
print(b)}}
> new.function(6)
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
[1] 36
> new.function=function(a,b,c)
{result=a*b+c
print(result)}
> new.function(2,3,4)
[1] 10
> new.function(5,2,4)
[1] 14
> new.function(a=7,b=5,c=3)
[1] 38
▶ By doing this we have generated our own function in R studio which is a*b+c, we can also
generate many other functions like this to increase the efficiency of our work and save time.
Simple programming constructs such as if… else, for, while, break.
When we’re programming in R (or any other language, for that matter), we often want to control
when and how particular parts of our code are executed. We can do that using control structures like
if-else statements, for loops, and while loops.
▶ The IF Conditional Statement: Let’s say we’re watching a sports match that decides
which team makes the playoffs. We could visualize the possible outcomes using this tree
chart:
IF STATEMENT:
▶ As we can see in the tree chart, there are only two possible outcomes. If Team A wins, they go
to the playoffs. If Team B wins, then they go.
> teamA=5
> teamB=3
> if(teamA>teamB)
{print("Team A Wins")}
ELSE STATEMENT:
What if Team A had 1 goal and Team B had 3 goals? Our team_A > team_B conditional would
evaluate to FALSE. As a result, nothing would be printed if we ran our code. Because the if
statement evaluates to false, the code block inside the if statement is not executed. In this
>teamA=1
> teamB=3
>if(teamA>teamB)
FOR LOOP:
It is a type of control statement that enables one to easily construct a loop that has to run
statements or a set of statements multiple times. For loop is commonly used to iterate over
items of a sequence.
For(value in sequence){statement}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
[1] 36
[1] 49
[1] 64
[1] 81
[1] 100
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
[1] 12
[1] 14
[1] 16
[1] 18
> week=c("sunday","monday","tuesday","wednesday","thursday","friday","saturday")
[1] "sunday"
[1] "monday"
[1] "tuesday"
[1] "wednesday"
[1] "thursday"
[1] "friday"
[1] "saturday"
While LOOP:
It is a type of control statement which will run a statement or a set of statements repeatedly
unless the given condition becomes false. A while loop in R is a close cousin of the for loop
in R. However, a while loop will check a logical condition, and keep running the loop as long
as the condition is true.
While(condition){statement}
▶ If the condition in the while loop in R is always true, the while loop will be an infinite loop, and
our program will never stop running.
> i=1
> while(i<=9)
{print(i)
i=i+1}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
▶ Example: Let’s take a team that’s starting the season with zero wins. They’ll need to win 10
matches to make the playoffs. We can write a while loop to tell us whether the team
wins:
> wins=0
> while(wins<10)
wins=wins+1}
It will run the command till the statement becomes false means till the number reaches 10
in this case.
Break statement in R:
> i=1
> while(i<=10)
{print(i)
if(i==4)
break
i=i+1
[1] 1
[1] 2
[1] 3
[1] 4
>q()
Inbuilt Functions of R
> View(iris)
> str(iris)
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
> table(iris$Species)
versicolor virginica
50 50 50
> min(iris$Sepal.Length)
[1] 4.3
> max(iris$Sepal.Length)
[1] 7.9
> range((iris$Sepal.Length))
> mean(iris$Sepal.Length)
[1] 5.843333
> marks=c(8,10,12,15,20,7,6,5,8,3,2,12,8,9,7,15,8)
Mean:
> mean(marks)
[1] 9.117647
> mean(iris$Sepal.Length)
[1] 5.843333
Median:
> median(marks)
[1] 8
> median(iris$Sepal.Length)
[1] 5.8
> table(marks)
marks
2 3 5 6 7 8 9 10 12 15 20
1 1 1 1 2 4 1 1 2 2 1
Mode:
> names(sort(-table(marks))[1])
[1] "8"
> names(sort(-table(iris$Sepal.Length))[1])
[1] "5"
> median(iris$Sepal.Length)
[1] 5.8
Summary statistics
• R provides a wide range of functions for obtaining summary statistics. One method of
obtaining descriptive statistics is to use the summary(file name) function with a specified
summary statistic.
• It helps you to calculate the descriptive statistics of the whole data series, the values
calculated by this are:
o Mean
o Median
o Minimum
o Maximum
> marks=c(8,10,12,15,20,7,6,5,8,3,2,12,8,9,7,15,8)
> summary(marks)
> summary(iris$Sepal.Length)
• For presenting the data in the form of simple plots we just need to write a command
plot(row name).
• This will help to draw a simple basic level plot of the data file selected, which looks like
this:
> marks=c(8,10,12,15,20,7,6,5,8,3,2,12,8,9,7,15,8)
>plot(marks)
>plot(iris$Sepal.Length)
Colored Quick plots:-
• We can also get the colored version of our plots for this the command adds the color
element to it, and the command is plot(row name, col=1)
2- Red
3- green
4- Blue
5- Aqua
6- Pink
> marks=c(8,10,12,15,20,7,6,5,8,3,2,12,8,9,7,15,8)
>plot(marks,col=2)
Histogram
🞅 A histogram is a graph that shows the frequency of numerical data using rectangles.
🞅 The height of a rectangle (the vertical axis) represents the distribution frequency of a variable
(the amount, or how often that variable appears).
🞅 The width of the rectangle (horizontal axis) represents the value of the variable (for instance,
minutes, years, or ages).
🞅 The histogram displays the distribution frequency as a two-dimensional figure, meaning the
height and width of columns or rectangles have particular meanings and can vary. A bar chart
is a one-dimensional figure. The height of its bars represents something specific.
🞅 To draw the colored histogram use the command hist(data file, col(“Red”))
🞅 To add labels to the horizontal axis use the command (xlab=) in the above command.
🞅 To add a heading to the histogram using the commanding main() in the above command.
> marks=c(8,10,12,15,20,7,6,5,8,3,2,12,8,9,7,15,8)
> hist(iris$Sepal.Length)
> hist(marks)
PIE CHARTS
🞅 A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to
illustrate numerical proportions.
🞅 In a pie chart, the arc length of each slice (and consequently its central angle
and area) is proportional to the quantity it represents.
🞅 Pie charts are created with the function pie(x, labels=) where x is a non-negative
numeric vector indicating the area of each slice and labels= notes a character
vector of names for the slices.