Sam BRM Rstudio

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

BUSINESS RESEARCH

METHODOLOGY
PRACTICAL FILE

PRESENTED BY

Samridhi
03019301722
Introduction To R

▶ The "R" name is derived from the first letter of the names of its two developers, Ross Ihaka and
Robert Gentleman, who were associated with the University of Auckland at the time.

▶ The initial version of R was released in 1995 to allow academic statisticians and others with
sophisticated programming skills to perform complex data statistical analysis and
display the results in form of a multitude of visual graphics.

▶ Commands can be anything from simple mathematical operators, including +, -, *, and /,


to more complicated functions that perform linear regressions and other advanced
calculations.

▶ Languages such as C++ require that an entire section of the code be written, compiled, and
run to see results, but in the case of r results can be seen after one command at a time.

R Studio:

▶ R is one of the programming languages that provide an intensive environment for you to
research, process, transform, and visualize information.

▶ Many users of the R programming language like the fact that it is free to download, offers
sophisticated data analytics capabilities, and has an active community of users online where
they can turn to for support.

▶ It has 9000+ packages.

▶ It is mainly used for complex data analysis in data science.

The General Layout of R-Studio:

R studio consists of 4 panes/ panels:

1. The Lower left pane is the R console, with is used to write the commands and functions (it is
the place where R is waiting for you to tell it what to do, and where it will show the results of
a command).

2. The upper left pane takes the place of a text editor but is far more powerful than.

3. The upper right pane holds information about the workspace, command history, and files in
the current folder.

4. The lower right pane displays plots, packages, information, and help files.
R Commands:

1. The console pane in RStudio is the place where commands written in the R language can
be typed and executed immediately by the computer.
2. It is also where the results will be shown for commands that have been executed. You
can type commands directly into the console and press Enter to execute those
commands, but they will be forgotten when you close the session.
3. If R is ready to accept commands, the R console by default shows a > prompt.
4. If it receives a command (by typing, copy-pasting, or sending from the script editor using
Ctrl + Enter), R will try to execute it, and when ready, will show the results and come back
with a new > prompt to wait for new commands.
5. If R is still waiting for you to enter more data because it isn’t complete yet, the console will
show a + prompt. It means that you haven’t finished entering a complete command.
Types of Data in R-studio:

▶ According to r-studio the four main types of data most likely to be used are

1. Numeric data,

2. Character data ( string like x),

3. Date data (time-based),

4. Logical data (true/false).

R Operators:

▶ Arithmetic operators: The R arithmetic operators allow us to do math operations, like sums,
divisions, or multiplications, among others.

▶ Logical/boolean operators: In addition, boolean or logical operators in R are used to specify


multiple conditions between objects. These comparisons return TRUE and FALSE values.

▶ Relational/comparison operators: Comparison or relational operators are designed to


compare objects like Greater than (>), Less than (<), Greater than equal to (>=), Less than
equal to (<=), Equal to (=).

▶ Assignment operators in R: The assignment operators in R allow you to assign data to a named
object in order to store the data like (<-)Left assignment, (->)Right assignment.

Command to clear R Console

CTRL + L

Variable

Variables can be thought of as a labelled container used to store information. Variables allow us
to recall saved information to later use in calculations. Variables can store many different things
in R studio, from single values to tables of information, images and graphs.

Defining and Assigning values to variables

Storing a value or “assigning” it to a variable is completed using either <-, = or -> function. The
name given to a variable should describe the information or data being stored. This helps when
revisiting old code or when sharing it with others.
> num1=2

> name="sumit"

> feepaid=TRUE

Assignment Operator: =, <-, ->

> num1=2

> num1

[1] 2

> num2<-4

> num2

[1] 4

> 7->num3

> num3

[1] 7

Arithmetic Operators

Addition

> 2+1

[1] 3

> 2+6

[1] 8

Subtraction

> 2-4
[1] -2

Multiplication

> 2*9

[1] 18

Division

> 2/7

[1] 0.28571428

EXPONENT/POWER

> 2^5

[1] 32

Use of Arithmetic Operators

> num1=15

> num2=5

> num1+num2

[1] 20

> num1-num2

[1] 10

> num1*num2

[1] 75

> num1/num2

[1] 3
Use of Relational Operators

num1=10

> num2=4

> num1>num2

[1] TRUE

> num1<num2

[1] FALSE

> num1==num2

[1] FALSE

> num1!=num2

[1] TRUE

> num1>=num2

[1] TRUE

> num1<=num2

[1] FALSE

Use of Logical Operator: & (and)

> num1=9

> num2=4

> num1>5 & num2<6

[1] TRUE

> num1>5 & num2<3

[1] FALSE

> num1>15 & num2<15


[1] FALSE
> num1>20 & num2<2

[1] FALSE

Use of Logical Operator: | (or)

> num1=20

> num2=10

> num1>5 | num2<6

[1] TRUE

> num1>5 | num2<4

[1] TRUE

> num1>20 | num2<20

[1] TRUE

> num1>30 | num2<2

[1] FALSE
Vectors in R studio:

In R, a sequence of elements that share the same data type is known as a vector. If we use only one
item like 2 then it is variable but if a number of items are calculated collectively it is called a vector.

> vec1=c(6,5,4,3,2,1)

> vec1

[1] 6,5,4,3,2,1

> class(vec1)

[1] "numeric"

> vec2=c("a","b","c","d","e")

> vec2

[1] "a" "b" "c" "d" "e"

> class(vec2)

[1] "character"

> vec3=c(F,T,F,T)

> vec3

[1] FALSE TRUE FALSE TRUE

> class(vec3)

[1] "logical"

> vec4=c(1,"a",4,"b",3)

> vec4

[1] "1" "a" "4" "b" "3"

> class(vec4)

[1] "character"

> vec5=c(1,T,2,F,3,F)
> vec5

[1] 1 1 2 0 3 0

> class(vec5)

[1] "numeric"

> vec6=c("d",T,"b",F,"c")

> vec6

[1] "d" "TRUE" "b" "FALSE" "c"

> class(vec6)

[1] "character"

> vec7=c(1,"a",T,2,"e",F)

> vec7

[1] "1" "a" "TRUE" "2" "e" "FALSE"

> class(vec7)

[1] "character"

Vector Arithmetic:

> vec1=c(1,2,3,4,5,6)

> vec2=c(1,1,1,1,1,1)

> vec1+vec2

[1] 2 3 4 5 6 7

> vec1-vec2

[1] 0 1 2 3 4 5

> vec1*vec2

[1] 1 2 3 4 5 6
[1] 1 2 3 4 5 6

Vector indexing:

Now try finding V1[4] or V2[3] etc. to find the value at 4 or 3 items. Also, try finding the length and
class of V1 and V2.

1. Write V1[4] which means extracting the 4th element from the V1 array.

2. Second to know the number of items in an array write Length(V1) – R will show you the total
number of items in an array.

3. To know the class of an array use the command Class(V1) it will show you the class
whether it is numeric, character, logical, etc.

> vec1=c(5,16,33,23,24,40,35,17)

> vec1[2]

[1] 16

> length(vec1)

[1] 8
Lists in R studio:

Lists are the R objects which contain elements of different types like − numbers, strings, vectors,
and other lists inside them. The list is created using list() function.

> l1=list(1,"a",TRUE)

> l1

[[1]]

[1] 1

[[2]]

[1] "a"

[[3]]

[1] TRUE

> class(l1[[1]])

[1] "numeric"

> class(l1[[2]])

[1] "character"

> class(l1[[3]])

[1] "logical"

List of Vectors

> l2=list(c(1,2,3),c("a","b","c"),c(T,F,T))

> l2
[[2]]

[1] "a" "b" "c"

[[3]]

[1] TRUE FALSE TRUE

> l2[[2]][1]

[1] "a"

> l2[[1]][3]

[1] 3
Matrices in R studio:

In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created with the help
of the vector input to the matrix function. On R matrices, we can perform addition, subtraction,
multiplication, and division operation.

In the R matrix, elements are arranged in a fixed number of rows and columns. The matrix elements are
the real numbers.

> m1=matrix(c(6,5,4,3,2,1))

> m1

[,1]

[1,] 6

[2,] 5

[3,] 4

[4,] 3

[5,] 2

[6,] 1

> m1=matrix(c(1,2,3,4,5,6),nrow=2,ncol=3)

> m1

[,1] [,2] [,3]

[1,] 1 3 5

[2,] 2 4 6

> m1=matrix(c(1,2,3,4,5,6),nrow=2,ncol=3,byrow=T)

> m1

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6
> m1[1,2]

[1] 2
Array in R studio:

An array is a data structure that can hold multi-dimensional data. In R, the array objects can hold
two or more two-dimensional data. Arrays are also called vector structures. A vector is an array of
numbers with a single index while a matrix is an array of numbers with two indices.

▶ Uni-dimensional arrays are called vectors with the length being their only dimension.

▶ Two-dimensional arrays are called matrices, consisting of fixed numbers of rows and
columns.

▶ Arrays consist of all elements of the same data type.

▶ An array in R can be created with the use of array() function.

> vec1=c(1,2,3,4,5,6)

> vec2=c(7,8,9,10,11,12)

> a1=array(c(vec1,vec2),dim=c(2,3,2))

> a1

,,1

[,1] [,2] [,3]

[1,] 1 3 5

[2,] 2 4 6

,,2

[,1] [,2] [,3]

[1,] 7 9 11

[2,] 8 10 12
> a1[1,2,1]

[1] 3

> a1[1,1,2]

[1] 7

> a1[2,3,2]

[1] 12
Data Frame

>fruits=data.frame(fruit_name=c("apple","banana","mango"),fruit_cost=c(100,200,300))

> fruits

fruit_name fruit_cost

1 apple 100

2 banana 200

3 mango 300

> fruits$fruit_cost

[1] 100 200 300

> fruits$fruit_name

[1] "apple" "banana" "mango"


User defined functions in R:

▶ A function is a set of statements organized together to perform a specific task. R has a large
number of in-built functions and the user can create their own functions.

▶ We can create user-defined functions in R. They are specific to what a user wants and once
created they can be used like the built-in functions.

▶ The user-defined functions can be explained as below:

> add5=function(x)

print(x*10)

> add5(5)

[1] 50

> new.function=function(a){for (i in 1:a) {b=i^2

print(b)}}

> new.function(6)

[1] 1

[1] 4

[1] 9

[1] 16

[1] 25

[1] 36

> new.function=function(a,b,c)

{result=a*b+c

print(result)}
> new.function(2,3,4)
[1] 10

> new.function(5,2,4)

[1] 14

> new.function(a=7,b=5,c=3)

[1] 38

▶ By doing this we have generated our own function in R studio which is a*b+c, we can also
generate many other functions like this to increase the efficiency of our work and save time.
Simple programming constructs such as if… else, for, while, break.

When we’re programming in R (or any other language, for that matter), we often want to control
when and how particular parts of our code are executed. We can do that using control structures like
if-else statements, for loops, and while loops.

▶ The IF Conditional Statement: Let’s say we’re watching a sports match that decides
which team makes the playoffs. We could visualize the possible outcomes using this tree
chart:

IF STATEMENT:

▶ As we can see in the tree chart, there are only two possible outcomes. If Team A wins, they go
to the playoffs. If Team B wins, then they go.

> teamA=5

> teamB=3

> if(teamA>teamB)

{print("Team A Wins")}

[1] "Team A Wins"

R will write Team A wins Because it is true as 5 is more than 3

ELSE STATEMENT:

What if Team A had 1 goal and Team B had 3 goals? Our team_A > team_B conditional would
evaluate to FALSE. As a result, nothing would be printed if we ran our code. Because the if
statement evaluates to false, the code block inside the if statement is not executed. In this
>teamA=1

> teamB=3

>if(teamA>teamB)

{print("Team A Wins")}else{print("Team B wins")}

[1] "Team B wins"

FOR LOOP:

It is a type of control statement that enables one to easily construct a loop that has to run
statements or a set of statements multiple times. For loop is commonly used to iterate over
items of a sequence.

For(value in sequence){statement}

> for(val in 1:4){print(val)}

[1] 1

[1] 2

[1] 3

[1] 4

Program to print first 10 natural numbers

> for(val in 1:9){print(val)}

[1] 1

[1] 2

[1] 3

[1] 4

[1] 5

[1] 6
[1] 7

[1] 8

[1] 9

Program to print square of first 10 natural numbers

> for(val in 1:10){print(val*val)}

[1] 1

[1] 4

[1] 9

[1] 16

[1] 25

[1] 36

[1] 49

[1] 64

[1] 81

[1] 100

Program to print table of a number:

> for(val in 1:9){print(2*val)}

[1] 2

[1] 4

[1] 6

[1] 8

[1] 10

[1] 12
[1] 14

[1] 16

[1] 18

Print Days of Week

> week=c("sunday","monday","tuesday","wednesday","thursday","friday","saturday")

> for (days in week) {print(days)}

[1] "sunday"

[1] "monday"

[1] "tuesday"

[1] "wednesday"

[1] "thursday"

[1] "friday"

[1] "saturday"
While LOOP:

It is a type of control statement which will run a statement or a set of statements repeatedly
unless the given condition becomes false. A while loop in R is a close cousin of the for loop
in R. However, a while loop will check a logical condition, and keep running the loop as long
as the condition is true.

While(condition){statement}

▶ If the condition in the while loop in R is always true, the while loop will be an infinite loop, and
our program will never stop running.

Program to print first 10 natural numbers

> i=1

> while(i<=9)

{print(i)

i=i+1}

[1] 1

[1] 2

[1] 3

[1] 4

[1] 5

[1] 6
[1] 7

[1] 8

[1] 9

▶ Example: Let’s take a team that’s starting the season with zero wins. They’ll need to win 10
matches to make the playoffs. We can write a while loop to tell us whether the team
wins:

> wins=0

> while(wins<10)

{print("will not win")

wins=wins+1}

It will run the command till the statement becomes false means till the number reaches 10
in this case.

Break statement in R:

Break statement is used to terminate the loop

> i=1

> while(i<=10)

{print(i)

if(i==4)

break

i=i+1

[1] 1

[1] 2

[1] 3
[1] 4

Command To Exit R Studio

>q()
Inbuilt Functions of R

> View(iris)

Show Data of Data Frame

> str(iris)

It will show the structure of Data Frame

'data.frame': 150 obs. of 5 variables:

$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...

$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...

$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...

$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

> head(iris)-Show top 6 records

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa


2 4.9 3.0 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa

4 4.6 3.1 1.5 0.2 setosa

5 5.0 3.6 1.4 0.2 setosa

6 5.4 3.9 1.7 0.4 setosa

> head(iris,3) – Show first 3 records

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setosa

2 4.9 3.0 1.4 0.2 setosa

3 4.7 3.2 1.3 0.2 setosa

> head(iris,10) – Show first 10 records

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

1 5.1 3.5 1.4 0.2 setos


a
2 4.9 3.0 1.4 0.2 setos
a
3 4.7 3.2 1.3 0.2 setos
a
4 4.6 3.1 1.5 0.2 setos
a
5 5.0 3.6 1.4 0.2 setos
a
6 5.4 3.9 1.7 0.4 setos
a
7 4.6 3.4 1.4 0.3 setos
a
8 5.0 3.4 1.5 0.2 setos
a
9 4.4 2.9 1.4 0.2 setos
a
10 4.9 3.1 1.5 0.1 setosa
> tail(iris) – Show last 6 records

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

145 6.7 3.3 5.7 2.5 virginica


146 6.7 3.0 5.2 2.3 virginica

147 6.3 2.5 5.0 1.9 virginica

148 6.5 3.0 5.2 2.0 virginica

149 6.2 3.4 5.4 2.3 virginica

150 5.9 3.0 5.1 1.8 virginica

> tail(iris,3) – show last 3 recoed

Sepal.Length Sepal.Width Petal.Length Petal.Width Species

148 6.5 3.0 5.2 2.0 virginica

149 6.2 3.4 5.4 2.3 virginica

150 5.9 3.0 5.1 1.8 virginica

> table(iris$Species)

It will show frequency of the field. setosa

versicolor virginica

50 50 50

> min(iris$Sepal.Length)

[1] 4.3

It will show the minimum value of a field

> max(iris$Sepal.Length)

[1] 7.9

It will show the maximum value of a field

> range((iris$Sepal.Length))

[1] 4.3 7.9

It will show the range (min & max) of values of a field

> mean(iris$Sepal.Length)
[1] 5.843333

It will show the average of filed.


Mean, Median and Mode

> marks=c(8,10,12,15,20,7,6,5,8,3,2,12,8,9,7,15,8)

Mean:

> mean(marks)

[1] 9.117647

> mean(iris$Sepal.Length)

[1] 5.843333

Median:

> median(marks)

[1] 8

> median(iris$Sepal.Length)

[1] 5.8

> table(marks)

It will show variable and its frequency.

marks

2 3 5 6 7 8 9 10 12 15 20

1 1 1 1 2 4 1 1 2 2 1

Mode:

> names(sort(-table(marks))[1])

[1] "8"

> names(sort(-table(iris$Sepal.Length))[1])

[1] "5"
> median(iris$Sepal.Length)

[1] 5.8
Summary statistics

• R provides a wide range of functions for obtaining summary statistics. One method of
obtaining descriptive statistics is to use the summary(file name) function with a specified
summary statistic.

• For this we first need to install a package in r studio named: Fbasics

• It helps you to calculate the descriptive statistics of the whole data series, the values
calculated by this are:

o Mean

o Median

o Minimum

o Maximum

o 1st and 3rd quartile

> marks=c(8,10,12,15,20,7,6,5,8,3,2,12,8,9,7,15,8)

> summary(marks)

Min. 1st Qu. Median Mean 3rd Qu. Max.

2.000 7.000 8.000 9.118 12.000 20.000

> summary(iris$Sepal.Length)

Min. 1st Qu. Median Mean 3rd Qu. Max.

4.300 5.100 5.800 5.843 6.400 7.900


Quick Plots

• For presenting the data in the form of simple plots we just need to write a command
plot(row name).

• This will help to draw a simple basic level plot of the data file selected, which looks like
this:

> marks=c(8,10,12,15,20,7,6,5,8,3,2,12,8,9,7,15,8)

>plot(marks)

>plot(iris$Sepal.Length)
Colored Quick plots:-

• We can also get the colored version of our plots for this the command adds the color
element to it, and the command is plot(row name, col=1)

• Color codes: 1- Black

2- Red

3- green

4- Blue

5- Aqua

6- Pink

> marks=c(8,10,12,15,20,7,6,5,8,3,2,12,8,9,7,15,8)

>plot(marks,col=2)
Histogram

🞅 A histogram is a graph that shows the frequency of numerical data using rectangles.

🞅 The height of a rectangle (the vertical axis) represents the distribution frequency of a variable
(the amount, or how often that variable appears).

🞅 The width of the rectangle (horizontal axis) represents the value of the variable (for instance,
minutes, years, or ages).

🞅 The histogram displays the distribution frequency as a two-dimensional figure, meaning the
height and width of columns or rectangles have particular meanings and can vary. A bar chart
is a one-dimensional figure. The height of its bars represents something specific.

🞅 To draw a histogram using the command hist(data file)

🞅 To draw the colored histogram use the command hist(data file, col(“Red”))

🞅 To add labels to the horizontal axis use the command (xlab=) in the above command.

🞅 To add a heading to the histogram using the commanding main() in the above command.

> marks=c(8,10,12,15,20,7,6,5,8,3,2,12,8,9,7,15,8)

> hist(iris$Sepal.Length)
> hist(marks)

PIE CHARTS

🞅 A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to
illustrate numerical proportions.
🞅 In a pie chart, the arc length of each slice (and consequently its central angle
and area) is proportional to the quantity it represents.

🞅 Pie charts are created with the function pie(x, labels=) where x is a non-negative
numeric vector indicating the area of each slice and labels= notes a character
vector of names for the slices.

> slices <- c(10, 12,4, 16, 8)

> lbls <- c("US", "UK", "Australia", "Germany", "France")

> pie(slices, labels = lbls, main="Pie Chart of Countries")

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy