0% found this document useful (0 votes)
19 views23 pages

R Unit 4

The document covers various probability distributions, including binomial, Poisson, and normal distributions, detailing their definitions, formulas, and applications. It provides R functions for calculating probabilities and examples of problems solved using these distributions. Additionally, it discusses correlation as a relationship between two variables.

Uploaded by

Ganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views23 pages

R Unit 4

The document covers various probability distributions, including binomial, Poisson, and normal distributions, detailing their definitions, formulas, and applications. It provides R functions for calculating probabilities and examples of problems solved using these distributions. Additionally, it discusses correlation as a relationship between two variables.

Uploaded by

Ganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

UNIT 4

PROBABILITY DISTRIBUTION:
Probability distribution of a random variable (x) shows how the probabilities
of the events are distributed over different values of the random variable. When all
values of a random variable are aligned on a graph, the values of its probabilities
generate a shape.

Binomial Distribution:
The binomial distribution is a discrete probability distribution. It describes the
outcome of n independent trials in an experiment. Each trial is assumed to have only
two outcomes, either success or failure. If the probability of a successful trial is p,
then the probability of having x successful outcomes in an experiment of n
independent trials is as follows.
n!
P(x) = px qn−x
(n − x)! x!

n! : This starts the count of number of ways event can occur


(n-x)! : This ends the count of number of ways event can occur
x! : This deletes duplication
Px : This is the probability of success for x trails
qn-x : This is the probability of failure for the x trails

R has four in-built functions to generate binomial distribution. They are:


 dbinom(x,size,prob) : This function gives the probability density distribution at
each point
 pbinom(x,size,prob) : This function gives the cumulative probability of an event.
It is a single value representing the probability
 qbinom(p,size,prob): This function takes the probability value and gives the
number whose cumulative value matches the probability value
 rbinom(n,size,prob) : This function generates required number of random values
of given probability from a given sample

Following is the description of the parameters used


 x is a vector of numbers
 p is a vector of probability
 n is number of observation
 size is number of trials
 prob is the probability of success of each trail
Problem: Suppose a dice is tossed 5 times. What is the probability of getting exactly
2 fours?

Solution: The number of trials: 5


The number of success: 2
The probability of success on a single trial is 1/6 or about 0.167

5
c2 * (0.167)2 * (0.833)3

R code: dbinom(2,size=5,prob=0.167)

Types of problems:
 dbinom: It is used to find the probability of a specific number of successes in a
fixed number of independent Bernoulli trails
 pbinom: It gives the probability that the number of successes is greater than or
equal to a specified value
 qbinom: It gives the value for which the cumulative probability is less than or
equal to a specified probability
 rbinom: It is used to cumulative random experiment based on the binomial
distribution

Problem: In a restaurant seventy percent of people order for Chinese food and thirty
percent for Italian food. A group of three people enter the restaurant. Find
the probability of at least 2 of them ordering for Italian food

Solution: The probability of Chinese food 70% = 0.7


The probability of Italian food 30% = 0.3
Now, if at least two of them are ordering Italian food then it implies that
either two or three will order Italian food.

P(x=2) = 3c2 *(0.3)2 * (0.7)1


= 3*0.09*0.7
=0.189

P(x=3)=3c3 * (0.3)3 * (0.7)0


=1*0.027*1
=0.027

Hence the probability for at least two persons ordering Italian food is,
P(x>=2) = P(x=2) + P(x=3)
= 0.189 + 0.027
=0.216

R code: dbinom(2, size=3, prob=0.3) + dbinom(3, size=3, prob=0.3)


Cumulative binomial probability:
A cumulative binomial probability refers to the probability that the binomial
random variable falls within a specified range (e.g., is greater than or equal to a
stated lower limit and less than or equal to a stated upper limit)

Problem: What is the probability of obtaining 45 or fewer heads in 100 tosses of a


coin?

Solution: To solve this problem, we compute 45 individual probabilities using


binomial formula. The sum of all these probabilities is the answer we seek
Thus,
b(x<=45,100,0.5)= b(x=0,100,0.5) + b(x=1,100,0.5) + …. b(x=45,100,0.5)

R code: pbinom(45, size=100, prob=0.5)

Problem: Suppose there are twelve multiple choice questions in an English class quiz.
Each question has five possible answers, and only one of them is correct. Find
the probability of having four or less correct answers if a student attempts to
answer every question at random.

Solution: Since only one out of five possible answers is correct, the probability of
answering a question correctly by random is 1/5=0.2

To find the probability of having exactly 4 correct answers by random


attempts as follows
dbinom(4,size=12,prob=0.2)

To find the probability of having four or less correct answers by random


attempts we apply the function dbinom with x=0,1,…4

R code:
dbinom(0,size=12,prob=0.2)+dbinom(1,size=12,prob=0.2)+dbinom(2,size=12,
prob=0.2)+dbinom(3,size=12,prob=0.2)+dbinom(4,size=12,prob=0.2)

Alternatively, we can use the cumulative probability function for binomial


distribution pbinom
pbinom(4,size=12,prob=0.2)

Poisson Distribution:
The Poisson distribution is the probability distribution of independent event
occurrences in an interval. If λ is the mean occurrence per interval, then the
probability of having x occurrences within a given interval is:

λx e−λ
p(x) = x!

Where: p(x): probability of x given λ


λ: Expected (mean) number of successes
e : 2.71828(base of natural logs)
x : Number of successes in per unit

Mean standard deviation


μ = E(x) = λ σ= λ

Examples:
1. The number of defective electric bulbs manufactured by a reputed company
2. The number of telephone calls per minute at a switch board
3. The number of cars passing a certain point in one minute
4. The number of printing mistakes per page in a large text

R has four in-built functions to generate binomial distribution. They are described
below:
 dpois(x,lambda,log=FALSE) : This function gives the probability density
distribution at each point
 ppois(q, lambda,lower.tail=TRUE,log.p=FALSE) : This function gives the
cumulative probability of an event. It is a single value representing the
probability
 qpois(p,lambda,lower.tail=TRUE,log.p=FALSE) : This function takes the
probability value and gives a number whose cumulative value matches the
probability value
 rpois(n,lambda) : This function generates required number of random values of
given probability from a given sample

Following is the description of the parameter used:


 x is a vector of numbers
 p is vector of probabilities
 n is a number of observations
 size is the number of trails
 prob is the probability of success of each trial

Problem: If there are twelve cars crossing a bridge per minute on average, find the
probability of having seventeen or more cars crossing the bridge in a
particular minute

Solution: The probability of having 16 or less cars crossing the bridge in a particular
minute is given by the function ppois

ppois(16,lambda=12) #lower tail

Hence the probability of having seventeen or more cars crossing the bridge in
a minute is in the upper tail of the probability density function
ppois(16,lambda=12,lower.tail=FALSE)
Problem: The average number of homes sold by the acme reality company is 2
homes per day. What is the probability that exactly 3 homes will be sold
tomorrow?
Solution: This is the Poisson experiment in which we know the following:
μ = 2 since 2 homes are sold per day, an average
X=3 since wee want to find the likelihood that 3 homes will be sold tomorrow
e=2.71828 since e is constant equal to approximately

Poisson formula
P(x;μ)=(� −� )(μx )/x!
=(2.71821-2)(23)/3!
=(0.13532)(8)/6
=0.180

R code:
dpois(3,lambda=2)

Cumulative Poisson distribution:


A cumulative Poisson distribution probability refers to the probability that
the Poisson random variable is greater than some specified lower limit and less
than some specified upper limit.

Problem: Suppose the average number of lions seen on 1-day safari is 5. what is the
probability that tourists will see fewer than four lions on the next 1-day safari

Solution: This is Poisson experiment in which we know the following:


μ = 5 Since 5 lions are seen per safari, on average
x = 0,1,2 or 3 since we want to find the likelihood that tourists will see fewer
than 4 lions, that is we want the probability that they will see 0,1,2 or 3 lions
e=2.71828 since e is constant equal to approximately 2.718

To solve this problem, we need to find the probability that tourists will see
0,1,2 or 3 lions. Thus, we need to calculate the sum of four probabilities:
P(x<=3,5)=P(0;5)+P(1;5)+P(2;5)+P(3;5)
=[(e-5)(50)/0!] + [(e-5)(51)/1!] + [(e-5)(52)/2!] + [(e-5)(53)]
=[0.0067] + [0.03369] + [0.084224] + [0.140375]
=0.2650
Thus the probability of seeing at no more than 3 lions is 0.2650

R code:
>ppois(3,lambda=5)

Normal Distribution:
A continuous random variable x follows a normal distribution with mean μ
and variance σ is a statistical distribution with probability density function
(μ−x)2
1 −
f(x) = e 2σ2

Standard normal distribution:
It is the distribution that occurs when a normal random variable has a mean
of zero and a standard deviation of one.
The normal random variable of a standard normal distribution is called a standard
score or a Z score. Every normal random variable x can be transformed into a Z score
via the following equation
z = (x − μ)/σ
Where x is a normal random variable, μ is the mean, and σ is the standard deviation,
yielding
1 2
P(x)dx= �−� /2 ��
2�

Standard normal curve:


One way of figuring out how data are distributed is to plot them in a graph. If
the data is evenly distributed, you may come up with a bell curve. A bell curve has a
small percentage of the points on both tails and the bigger percentage on the inner
part of the curve. The shape of the standard normal distribution looks like this:

 Mean=median=mode
 Symmetry about the center
 50% of values less than the mean and 50% greater than the mean
R functions:
 dnorm(x,mean=0,sd=1,log=FALSE) : This function gives the probability density
distribution at each point.
 pnorm(q,mean=0,sd=1,lower.tail=TRUE,log.p=FALSE) : This function gives the
cumulative probability of an event. It is a single value representing the
probability
 qnorm(p,mean=0,sd=1,lower.tail=TRUE,log.p=FALSE) : This function takes the
probability value and gives a number whose cumulative value matches the
probability value
 rnorm(n,mean=0,sd=1) : This function generates required number of random
values of given probability from a given sample

Procedure to find probability using positive Z score table

Case 1 Area between 0 and any Area(z)


Z score
Case 2 Area in any tail 0.5-Area(z)

Case 3 Area between two Z |Area(z2)-Area(z1)|


scores on the same side
of the mean

Case 4 Area between two Z |Area(z1)+Area(z2)|


scores on the opposite
side of the mean
Case 5 Area to the left of a 0.5+Area(z)
positive Z score

Case 6 Area to the right of a 0.5-Area(z)


negative Z score
Problem: X is a normally distributed variable with mean μ = 30 and standard
deviation σ = 4. Find
(a) P(x< 40)
(b) P(x>21)
(c) P(30<x<35)
Solution: (a) For x=40, then
Z=(x − μ)/σ
=40-30/4
=2.5(=z1 say)
Hence P(x<40)=P(z<2.5)
=0.5+A(z1)
=0.9938

(b) For x=21, then


Z=(x − μ)/σ
=21-30/4
=2.25(=z1 say)
Hence P(x>21)=P(z>2.25)
=0.5-A(z1)
=0.9878

(c) For x=30, then


Z=(x − μ)/σ
=30-30/4
=0
For x=35,then
Z=(x − μ)/σ
=35-40/4
=1.25
Hence P(30<x<35)=P(0<x<1.25)
=0.3944

Problem: The length of life of an instrument produced by a machine has a normal


distribution with a mean of 12 months and standard deviation of 2 months.
Find the probability than an instrument produced by this machine will last
(a) Less than 7 months
(b) Between 7 and 12 months

Solution: μ = 12
σ=2
(a) Less than 7 months
P(x=7)
For x =7, then
Z=(x − μ)/σ
=(7-12)/2
=-2.5(=z1 say)
Hence P(x<7)=P(z<-2.5)
=0.5+(-0.4938)
=0.0062

R code: pnorm(7,mean=12,sd=2)

(b) Between 7 and 12 months


P(7<x<12)
For x=12, then
Z=(x − μ)/σ
=12-12/2
=0

For x=7, then


Z=(x − μ)/σ
=7-12/2
=-2.5 (=z1 say)

Hence P(7<x<12) = P(-2.5<x<0)


= 0.4938

Problem: Assume that the test scores of a college entrance exam fits a normal
distribution. Furthermore, the mean test score is 72, and the standard
deviation is 15.2. what is the percentage of students scoring 84 or more in
the exam?
Solution: R code:
pnorm(84,mean=72,sd=15.2, lower.tail=FALSE)

CORRELATION:
A correlation is a relationship between two variables. Typically, we take x to
be the independent variable. We take y to be the dependent variable. Data is
represented by a collection of ordered pairs (x,y)

�=1
(�� − �)(�� − �)
��� =
� �
�=1
(�� − �)2 �=1
(�� − �) 2

This will always be a number between -1 and 1(inclusive)


 If r is close to 1, we say that the variables are positively correlated. This means
there is likely a strong linear relationship between the two variables, with a
positive slope
 If r is close to -1, we say that the variables are negatively correlated. This means
there is likely a strong linear relationship between the two variables, with a
negative slope
 If r is close to 0, we say that the variables are not correlated. This means that
there is likely no linear relationship between the two variables. However, the
variables may still be related in some other way.
To run a correlation test we type:
cor.test(var1,var2,method=”method”)

The default method is “Pearson” so you may omit this if that is what you want. If you
type “Kendall” or “Spearman” then you will get the appropriate significance test.

Problem: The local ice cream shop keeps track of how much ice cream they sell
versus the temperature on that day, here are their figures for the last 12 days

Temp 14.2 16.4 11.9 15.2 18.5 22.1 19.4 25.1 23.4 18.1 22.6 17.2

Ice $215 $325 $185 $332 $406 $522 $412 $614 $544 $421 $445 $408
cream
sales

Formula for correlation coefficient:


R code:
> temp <- c(14.2,16.4,11.9,15.2,18.5,22.1,19.4,25.1,23.4,18.1,22.6,17.2)
> sales <- c(215,325,185,332,406,522,412,614,544,421,445,408)
> corr_coeff <- cor(temp,sales)
> corr_coeff
[1] 0.9575066
> cov(temp,sales)
[1] 484.0932
#Adds a line of best fit to your scatter plot
> plot(temp, sales, pch=16,col="red")
>abline(lm(sales~temp),col="blue")

Covariance in R programming language:


In R programming, covariance can be measured using the cov() function.
Covariance is a statistical term used to measure the direction of a linear relationship
between the data vectors.
Mathematically,
(� −�)(� −�)
Cov(x,y)= � � �
Where,
X represents the x data vector
Y represents the y data vector
� represents the mean of x data vector
� represents the mean of y data vector
N represents total observations

Syntax:
>cov(x,y,mean)
Where,
X and y represents the data vector
Method defines the type of method to be used to compute covariance
Default is “Pearson”

Ex:
>x<-c(1,3,5,10)
>y<-c(2,4,6,20)
>print(cov(x,y))
>print(cov(x,y,method=”Pearson”))
>print(cov(x,y,method=”Kendall”))
>print(cov(x,y,method=”Spearman”))

T Test for single mean:


One sample t-test is used to compare the mean of a population to a specified
theoretical mean(μ).
Let x represents a set of values with size n, with mean μ and with standard
deviation S. The comparison of the observed mean (μ) of the population to the
theoretical value μ is performed with the formula below

x − μ0
t=
s/√n
To evaluate whether the difference is statistically significant, you first have to read in
t test table the critical value of students t distribution corresponding to the
significance level alpha of your choice (5%). The degrees of freedom (df) used in this
test are df=n-1

Problem: A professor wants to know if her introductory statistics class has a good
grasp of basic math. Six students are chosen at random from the class and
given a math proficiency test. The professor wants the class to be able to
score above 70 on the test. The six students get scores of 62,92,75,68,83 and
95. can the professor have 90 percent confidence that the mean score for the
class on the test would be above 70?
Solution:
Null Hypothesis H0: μ=70
Alternative Hypothesis Ha: μ>70
First, compute the sample mean and standard deviation
62+92+75+68+83+95
�= 6

476
= 6
= 79.71 and standard deviation=13.17

 Null Hypothesis H0: The sample meet upto standard I.e., μ>70 hours
 Alternative Hypothesis Ha : μ is not greater than 70,
 Level of significance : α = 0.05
x−μ
The test statistic is t = s/√n0

79.71−70
t= 13.17/√6

9.17
t = 5.38

= 1.71 (calculate value of t)


To test the hypothesis, the computed t-value of 1.71 will be compared to the
critical value in the t-table with 5 df is 1.67, the calculate of t is more than
table value of t, so null hypothesis is rejected

R code:
> t.test(x,alternative=”two.sided”,mu=70)
Problem: A sample of 26 bulbs gives a mean life of 990 hours with S.D of 20 hours.
The manufacturer claims that the mean life of bulbs is 1000 hours. Is sample
meet up to the standard.
Solution:
Here n=26
Sample mean � = 990 hours
S.D s=20 hours
Population mean μ=1000 hours
Df=n-1
= 26-1
=25

 Null Hypothesis H0: The sample meet up to standard I.e., μ=1000 hours
 Alternative Hypothesis Ha: μ not equal to 1000
 Level of significance α =0.05
 The test statistic is

x−μ0
t=
s/√n
79.71−70
t=
13.17/√6

=2.5(calculate value of t)
Table values of t with 25 df is 1.708
The calculate value of t is more than table value of t, so null hypothesis is
rejected at 5% level

Paired Comparison:(paired t-test)


Sometimes data comes from non-independent samples. An example might
be testing “before and after” of cosmetics or consumer products. We could use a
single random sample and do “before and after” tests on each person. A hypothesis
test based on these data would be called a paired comparisons test Since the
observations come in pairs, we can study the difference d between the samples. The
difference between each pair of measurements is called di.

Test statistic: With a population of n pairs of measurements, forming a simple


random sample from a normally distributed population, the mean of the difference
�, is tested using the following implementation of t.

d−μ
t=
s/√n

Problem: The blood pressure of 5 women before and after intake of a certain drug
are given below: Test whether there is significant change in blood pressure at
1% level of significance
Before 110 120 125 132 125
After 120 118 125 136 121
Solution: Let μ be the mean of population of differences
 Null Hypothesis H0: μ1=μ2 I.e., no change in BP
 Alternative Hypothesis Ha: μ1 ≠ μ2 I.e., no change in BP
 Level of significance α=0.01
 Computation: Differences di’s (before and after drug) are -10,2,0,-4,4

−10 + 2 + 0 + ( − 4) + 4
�=
5

=-8/5
=-1.6

1 �
S2=�−1 �=1
(�� − �)2

1 5
=4 �=1
(�� − �)2

1
=4 [( − 10 + 1.6)2+(2 + 1.6)2 + (0 + 1.6)2 + ( − 4 + 1.6)2 + (4 + 1.6)2 ]
123.20
= 4
=30.8

S= 30.8 =5.55

 Test statistic: The test statistic is t which is calculated as


d−μ
t=
s/√n

−1.16
=
5.55/ 5

=0.645
Calculated |t| value is 0.645
Tabulates t0.01 with 5-1=4 degrees of freedom is 3.747
Since calculated t<t0.01, we accept the NULL hypothesis and conclude that there
is no significant change in blood pressure

R code:
>x<-c(110,120,125,132,125)
>y<-c(120,118,125,136,121)
>t.test(x,y,paired=TRUE)

T-test for difference of two population means:


With a two-sample t-test, we compare the population means to each other and
again look at the difference. We expect that � − � would be close to μ1−μ2 . The
test statistics will use both sample means, sample standard deviations, and sample
sizes for the test. A two sample t-test follows.
 Write the null and alternative hypothesis
 State the level of significance and find the critical value. The critical value,
from the students t-distribution, has the lesser of n1-1 degrees of freedom
 Compute the test statistic
 Compare the test statistic to the critical value and state a conclusion

x−y
t=
1 1
s +
n1 n2

~ tn1+n2-2

Where
�1 �21 +�2 �22
S2= �1 +�2 −2
Or
(��−�)2 + (�� −�)2
S2= �1+�2−2
Problem: Two horses A and B were tested according to the time (in seconds) to
run a particular track with the following results

Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 29
Test whether the two horses have the same running capacity

Solution:
Given n1=7 and n2=6
We first compute the same means and standard deviations.
�=Mean of the first sample
1
= (28 + 30 + 32 + 33 + 33 + 29 + 34)
7
1
=7(219)
=31.286

�=Mean of the second sample


1
=6 (29 + 30 + 30 + 24 + 27 + 29)
1
=6 (169)
=28.16

x X-� (X-�)2 y y-� (y-�)2


28 -3.286 10.8 29 0.84 0.7056
30 -1.286 1.6538 30 1.84 3.3856
32 0.714 0.51 30 1.84 3.3856
33 1.714 2.94 24 -4.16 17.3056
33 1.714 2.94 27 -1.16 1.3456
29 -2.286 5.226 29 0.84 0.7056
34 2.714 7.366
Total=219 31.4359 169 26.8336

(�� −�)2 + (��−�)2


Now: S2= �1 +�2 −2

31.4359+26.8336
= 7+6−2

=5.26

Therefore: S= 5.26
=2.3

 Null Hypothesis H0: μ1=μ2


 Alternative Hypothesis Ha: μ1 ≠ μ2
 Level of significance α=0.05
 Computation:
x−y
t=
1 1
s +
n1 n2

31.86 − 28.16
t=
1 1
(2.3) +
7 6

=2.443

Tabulates t0.05 with 7+6-2=11 degrees of freedom at 5% level of significance is


2.2
Since calculated t>t0.05, we reject the Null hypothesis and conclude that there
is no significant change in blood pressure.

ANOVA:(Analysis of variance)
When we have only two samples we can use the t-test to compare the means
of the samples but it might become unreliable in case of more than two samples. If
we only compare two means, then the t-test (independent samples) will give the
same results as the ANOVA. Anova is performed with F-test Null Hypothesis H0.
There are no differences among the mean values of the groups being compared (I.e.,
the group means are all equal)
H0:�1 = �2 = �3 = . . . = ��
Alternative Hypothesis H1: (conclusion if H0 rejected)?
Not all group means are equal(I.e., at least one group mean is different from the rest)

ANOVA one-way classification:


Step 1: Total number of all observations
T= � � ���
Step 2: correlation factor
�2 �2
Cf= � = ��
Step 3: Total sum of squares
TSS=�2 �= � � �2�� − ��
Step 4: Treatment sum of squares
��2
TrSS=�2 �r= � − ��
Step 5: Error sum of squares
ESS=�2 � = ��� − ����
Source of df Sum of squares TSS F-test
variable
Treatment(b/w K-1 �2 �r=
��2
− �� �2�r=
���2 �2�r
Tcal= �2�
sample) � �−1

Error n-K 2
� � = ��� − ���� �2 �
�2�=�−�

BASIC STATISTICS:
Average a number expressing the central or typical value in a set of data, in
particular the mean,mode,median or(most commonly) the mean, which is calculated
by dividing the sum of the values in the set by their number. The basic formula for
the average of n numbers x1,x2,…xn is

A=(x1+x2+…+xn)/n

Ex:
Suppose there are 8 data points 2,4,4,4,5,5,7,9. The average of these 8 data points is,
A=2+4+4+4+5+5+7+9/8
=5

R code:
List=c(2,4,4,4,5,5,7,9)
Print(mean(list))

Variance in R programming language:


Variance is the sum of squares of differences between all numbers and
means. The mathematical formula for variance is as follows

2 �=1
= 1(�� − �)2
� =

Where,
� is mean
N is the total number of elements or frequency of distribution

Example: Let’s consider the same dataset that we have taken in average. First,
calculate the deviations of each data point from the mean, and square the result of
each.
(2-5)2=(-3)2=9
(4-5)2=(-1)2=1
(4-5)2=(-1)2=1
(4-5)2=(-1)2=1
(5-5)2=02=0
(5-5)2=02=0
(7-5)2=22=4
(9-5)2=42=16

9+1+1+1+0+0+4+16
Variance= 8
=4
Computing variance in R programming
Syntax: var(x)

R code:
>list=c(2,4,4,4,5,5,7,9)
>print(var(list))

Standard deviation in R programming language:


Standard deviation is the square root of variance. It is a measure of the
extent to which data varies from the mean. The mathematical formula for calculating
standard deviation is as follows.
Standard deviation= ��������

Example: standard deviation for the data 2,4,4,4,5,5,7,9


Standard deviation= 4
=2

R code:
>list=c(2,4,4,4,5,5,7,9)
>sd(list)

Mean:
It is the sum of observations divided by the total number of observations. It is
also defined as average which is the sum divided by count

Mean(�) = �
R code:
>x<-c(2,4,4,4,5,5,7,9)
>mean(x)

Median:
It is the middle value of the data set. It splits the data into two halves. If the
number of elements in the data set is odd then the center element is median and if it
is even then the median value would be the average of two central elements.
Odd Even

N+1/2 n/2, 2 +1

R code:
>x<-c(2,4,4,4,5,5,7,9)
>median(x)

Mode:
It is the value that has the highest frequency in the given data set. The data
set may have no mode if the frequency of all data points is the same. Also, we can
have more than one mode if we encounter two or more data points having the same
frequency. There is no inbuilt function for finding the mode in R, so we can create
our own function for finding the mode or we can use the package called modest.
R code:
>mode<-function(v){
>uniqv<-unique(v)
>uniqv[which.max(table(match(v,uniqv)))]
>}
>v<-c(2,4,4,4,5,5,7,9)
>result<-mode(v)
>print(result)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy