0% found this document useful (0 votes)

19 views23 pages

R Unit 4

The document covers various probability distributions, including binomial, Poisson, and normal distributions, detailing their definitions, formulas, and applications. It provides R functions for calculating probabilities and examples of problems solved using these distributions. Additionally, it discusses correlation as a relationship between two variables.

Uploaded by

Ganesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views23 pages

R Unit 4

Uploaded by

Ganesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

UNIT 4

PROBABILITY DISTRIBUTION:
Probability distribution of a random variable (x) shows how the probabilities
of the events are distributed over different values of the random variable. When all
values of a random variable are aligned on a graph, the values of its probabilities
generate a shape.

Binomial Distribution:
The binomial distribution is a discrete probability distribution. It describes the
outcome of n independent trials in an experiment. Each trial is assumed to have only
two outcomes, either success or failure. If the probability of a successful trial is p,
then the probability of having x successful outcomes in an experiment of n
independent trials is as follows.
n!
P(x) = px qn−x
(n − x)! x!

n! : This starts the count of number of ways event can occur

(n-x)! : This ends the count of number of ways event can occur
x! : This deletes duplication
Px : This is the probability of success for x trails
qn-x : This is the probability of failure for the x trails

R has four in-built functions to generate binomial distribution. They are:

 dbinom(x,size,prob) : This function gives the probability density distribution at
each point
 pbinom(x,size,prob) : This function gives the cumulative probability of an event.
It is a single value representing the probability
 qbinom(p,size,prob): This function takes the probability value and gives the
number whose cumulative value matches the probability value
 rbinom(n,size,prob) : This function generates required number of random values
of given probability from a given sample

Following is the description of the parameters used

 x is a vector of numbers
 p is a vector of probability
 n is number of observation
 size is number of trials
 prob is the probability of success of each trail
Problem: Suppose a dice is tossed 5 times. What is the probability of getting exactly
2 fours?

Solution: The number of trials: 5

The number of success: 2
The probability of success on a single trial is 1/6 or about 0.167

5
c2 * (0.167)2 * (0.833)3

R code: dbinom(2,size=5,prob=0.167)

Types of problems:
 dbinom: It is used to find the probability of a specific number of successes in a
fixed number of independent Bernoulli trails
 pbinom: It gives the probability that the number of successes is greater than or
equal to a specified value
 qbinom: It gives the value for which the cumulative probability is less than or
equal to a specified probability
 rbinom: It is used to cumulative random experiment based on the binomial
distribution

Problem: In a restaurant seventy percent of people order for Chinese food and thirty
percent for Italian food. A group of three people enter the restaurant. Find
the probability of at least 2 of them ordering for Italian food

Solution: The probability of Chinese food 70% = 0.7

The probability of Italian food 30% = 0.3
Now, if at least two of them are ordering Italian food then it implies that
either two or three will order Italian food.

P(x=2) = 3c2 (0.3)2 (0.7)1

= 3*0.09*0.7
=0.189

P(x=3)=3c3 * (0.3)3 * (0.7)0

=1*0.027*1
=0.027

Hence the probability for at least two persons ordering Italian food is,
P(x>=2) = P(x=2) + P(x=3)
= 0.189 + 0.027
=0.216

R code: dbinom(2, size=3, prob=0.3) + dbinom(3, size=3, prob=0.3)

Cumulative binomial probability:
A cumulative binomial probability refers to the probability that the binomial
random variable falls within a specified range (e.g., is greater than or equal to a
stated lower limit and less than or equal to a stated upper limit)

Problem: What is the probability of obtaining 45 or fewer heads in 100 tosses of a

coin?

Solution: To solve this problem, we compute 45 individual probabilities using

binomial formula. The sum of all these probabilities is the answer we seek
Thus,
b(x<=45,100,0.5)= b(x=0,100,0.5) + b(x=1,100,0.5) + …. b(x=45,100,0.5)

R code: pbinom(45, size=100, prob=0.5)

Problem: Suppose there are twelve multiple choice questions in an English class quiz.
Each question has five possible answers, and only one of them is correct. Find
the probability of having four or less correct answers if a student attempts to
answer every question at random.

Solution: Since only one out of five possible answers is correct, the probability of
answering a question correctly by random is 1/5=0.2

To find the probability of having exactly 4 correct answers by random

attempts as follows
dbinom(4,size=12,prob=0.2)

To find the probability of having four or less correct answers by random

attempts we apply the function dbinom with x=0,1,…4

R code:
dbinom(0,size=12,prob=0.2)+dbinom(1,size=12,prob=0.2)+dbinom(2,size=12,
prob=0.2)+dbinom(3,size=12,prob=0.2)+dbinom(4,size=12,prob=0.2)

Alternatively, we can use the cumulative probability function for binomial

distribution pbinom
pbinom(4,size=12,prob=0.2)

Poisson Distribution:
The Poisson distribution is the probability distribution of independent event
occurrences in an interval. If λ is the mean occurrence per interval, then the
probability of having x occurrences within a given interval is:

λx e−λ
p(x) = x!

Where: p(x): probability of x given λ

λ: Expected (mean) number of successes
e : 2.71828(base of natural logs)
x : Number of successes in per unit

Mean standard deviation

μ = E(x) = λ σ= λ

Examples:
1. The number of defective electric bulbs manufactured by a reputed company
2. The number of telephone calls per minute at a switch board
3. The number of cars passing a certain point in one minute
4. The number of printing mistakes per page in a large text

R has four in-built functions to generate binomial distribution. They are described
below:
 dpois(x,lambda,log=FALSE) : This function gives the probability density
distribution at each point
 ppois(q, lambda,lower.tail=TRUE,log.p=FALSE) : This function gives the
cumulative probability of an event. It is a single value representing the
probability
 qpois(p,lambda,lower.tail=TRUE,log.p=FALSE) : This function takes the
probability value and gives a number whose cumulative value matches the
probability value
 rpois(n,lambda) : This function generates required number of random values of
given probability from a given sample

Following is the description of the parameter used:

 x is a vector of numbers
 p is vector of probabilities
 n is a number of observations
 size is the number of trails
 prob is the probability of success of each trial

Problem: If there are twelve cars crossing a bridge per minute on average, find the
probability of having seventeen or more cars crossing the bridge in a
particular minute

Solution: The probability of having 16 or less cars crossing the bridge in a particular
minute is given by the function ppois

ppois(16,lambda=12) #lower tail

Hence the probability of having seventeen or more cars crossing the bridge in
a minute is in the upper tail of the probability density function
ppois(16,lambda=12,lower.tail=FALSE)
Problem: The average number of homes sold by the acme reality company is 2
homes per day. What is the probability that exactly 3 homes will be sold
tomorrow?
Solution: This is the Poisson experiment in which we know the following:
μ = 2 since 2 homes are sold per day, an average
X=3 since wee want to find the likelihood that 3 homes will be sold tomorrow
e=2.71828 since e is constant equal to approximately

Poisson formula
P(x;μ)=(� −� )(μx )/x!
=(2.71821-2)(23)/3!
=(0.13532)(8)/6
=0.180

R code:
dpois(3,lambda=2)

Cumulative Poisson distribution:

A cumulative Poisson distribution probability refers to the probability that
the Poisson random variable is greater than some specified lower limit and less
than some specified upper limit.

Problem: Suppose the average number of lions seen on 1-day safari is 5. what is the
probability that tourists will see fewer than four lions on the next 1-day safari

Solution: This is Poisson experiment in which we know the following:

μ = 5 Since 5 lions are seen per safari, on average
x = 0,1,2 or 3 since we want to find the likelihood that tourists will see fewer
than 4 lions, that is we want the probability that they will see 0,1,2 or 3 lions
e=2.71828 since e is constant equal to approximately 2.718

To solve this problem, we need to find the probability that tourists will see
0,1,2 or 3 lions. Thus, we need to calculate the sum of four probabilities:
P(x<=3,5)=P(0;5)+P(1;5)+P(2;5)+P(3;5)
=[(e-5)(50)/0!] + [(e-5)(51)/1!] + [(e-5)(52)/2!] + [(e-5)(53)]
=[0.0067] + [0.03369] + [0.084224] + [0.140375]
=0.2650
Thus the probability of seeing at no more than 3 lions is 0.2650

R code:
>ppois(3,lambda=5)

Normal Distribution:
A continuous random variable x follows a normal distribution with mean μ
and variance σ is a statistical distribution with probability density function
(μ−x)2
1 −
f(x) = e 2σ2
2π
Standard normal distribution:
It is the distribution that occurs when a normal random variable has a mean
of zero and a standard deviation of one.
The normal random variable of a standard normal distribution is called a standard
score or a Z score. Every normal random variable x can be transformed into a Z score
via the following equation
z = (x − μ)/σ
Where x is a normal random variable, μ is the mean, and σ is the standard deviation,
yielding
1 2
P(x)dx= �−� /2 ��
2�

Standard normal curve:

One way of figuring out how data are distributed is to plot them in a graph. If
the data is evenly distributed, you may come up with a bell curve. A bell curve has a
small percentage of the points on both tails and the bigger percentage on the inner
part of the curve. The shape of the standard normal distribution looks like this:

 Mean=median=mode
 Symmetry about the center
 50% of values less than the mean and 50% greater than the mean
R functions:
 dnorm(x,mean=0,sd=1,log=FALSE) : This function gives the probability density
distribution at each point.
 pnorm(q,mean=0,sd=1,lower.tail=TRUE,log.p=FALSE) : This function gives the
cumulative probability of an event. It is a single value representing the
probability
 qnorm(p,mean=0,sd=1,lower.tail=TRUE,log.p=FALSE) : This function takes the
probability value and gives a number whose cumulative value matches the
probability value
 rnorm(n,mean=0,sd=1) : This function generates required number of random
values of given probability from a given sample

Procedure to find probability using positive Z score table

Case 1 Area between 0 and any Area(z)

Z score
Case 2 Area in any tail 0.5-Area(z)

Case 3 Area between two Z |Area(z2)-Area(z1)|

scores on the same side
of the mean

Case 4 Area between two Z |Area(z1)+Area(z2)|

scores on the opposite
side of the mean
Case 5 Area to the left of a 0.5+Area(z)
positive Z score

Case 6 Area to the right of a 0.5-Area(z)

negative Z score
Problem: X is a normally distributed variable with mean μ = 30 and standard
deviation σ = 4. Find
(a) P(x< 40)
(b) P(x>21)
(c) P(30<x<35)
Solution: (a) For x=40, then
Z=(x − μ)/σ
=40-30/4
=2.5(=z1 say)
Hence P(x<40)=P(z<2.5)
=0.5+A(z1)
=0.9938

(b) For x=21, then

Z=(x − μ)/σ
=21-30/4
=2.25(=z1 say)
Hence P(x>21)=P(z>2.25)
=0.5-A(z1)
=0.9878

(c) For x=30, then

Z=(x − μ)/σ
=30-30/4
=0
For x=35,then
Z=(x − μ)/σ
=35-40/4
=1.25
Hence P(30<x<35)=P(0<x<1.25)
=0.3944

Problem: The length of life of an instrument produced by a machine has a normal

distribution with a mean of 12 months and standard deviation of 2 months.
Find the probability than an instrument produced by this machine will last
(a) Less than 7 months
(b) Between 7 and 12 months

Solution: μ = 12
σ=2
(a) Less than 7 months
P(x=7)
For x =7, then
Z=(x − μ)/σ
=(7-12)/2
=-2.5(=z1 say)
Hence P(x<7)=P(z<-2.5)
=0.5+(-0.4938)
=0.0062

R code: pnorm(7,mean=12,sd=2)

(b) Between 7 and 12 months

P(7<x<12)
For x=12, then
Z=(x − μ)/σ
=12-12/2
=0

For x=7, then

Z=(x − μ)/σ
=7-12/2
=-2.5 (=z1 say)

Hence P(7<x<12) = P(-2.5<x<0)

= 0.4938

Problem: Assume that the test scores of a college entrance exam fits a normal
distribution. Furthermore, the mean test score is 72, and the standard
deviation is 15.2. what is the percentage of students scoring 84 or more in
the exam?
Solution: R code:
pnorm(84,mean=72,sd=15.2, lower.tail=FALSE)

CORRELATION:
A correlation is a relationship between two variables. Typically, we take x to
be the independent variable. We take y to be the dependent variable. Data is
represented by a collection of ordered pairs (x,y)
�
�=1
(�� − �)(�� − �)
�� =
� �
�=1
(�� − �)2 �=1
(�� − �) 2

This will always be a number between -1 and 1(inclusive)

 If r is close to 1, we say that the variables are positively correlated. This means
there is likely a strong linear relationship between the two variables, with a
positive slope
 If r is close to -1, we say that the variables are negatively correlated. This means
there is likely a strong linear relationship between the two variables, with a
negative slope
 If r is close to 0, we say that the variables are not correlated. This means that
there is likely no linear relationship between the two variables. However, the
variables may still be related in some other way.
To run a correlation test we type:
cor.test(var1,var2,method=”method”)

The default method is “Pearson” so you may omit this if that is what you want. If you
type “Kendall” or “Spearman” then you will get the appropriate significance test.

Problem: The local ice cream shop keeps track of how much ice cream they sell
versus the temperature on that day, here are their figures for the last 12 days

Temp 14.2 16.4 11.9 15.2 18.5 22.1 19.4 25.1 23.4 18.1 22.6 17.2
℃
Ice $215 $325 $185 $332 $406 $522 $412 $614 $544 $421 $445 $408
cream
sales

Formula for correlation coefficient:

R code:
> temp <- c(14.2,16.4,11.9,15.2,18.5,22.1,19.4,25.1,23.4,18.1,22.6,17.2)
> sales <- c(215,325,185,332,406,522,412,614,544,421,445,408)
> corr_coeff <- cor(temp,sales)
> corr_coeff
[1] 0.9575066
> cov(temp,sales)
[1] 484.0932
#Adds a line of best fit to your scatter plot
> plot(temp, sales, pch=16,col="red")
>abline(lm(sales~temp),col="blue")

Covariance in R programming language:

In R programming, covariance can be measured using the cov() function.
Covariance is a statistical term used to measure the direction of a linear relationship
between the data vectors.
Mathematically,
(� −�)(� −�)
Cov(x,y)= � � �
Where,
X represents the x data vector
Y represents the y data vector
� represents the mean of x data vector
� represents the mean of y data vector
N represents total observations

Syntax:
>cov(x,y,mean)
Where,
X and y represents the data vector
Method defines the type of method to be used to compute covariance
Default is “Pearson”

Ex:
>x<-c(1,3,5,10)
>y<-c(2,4,6,20)
>print(cov(x,y))
>print(cov(x,y,method=”Pearson”))
>print(cov(x,y,method=”Kendall”))
>print(cov(x,y,method=”Spearman”))

T Test for single mean:

One sample t-test is used to compare the mean of a population to a specified
theoretical mean(μ).
Let x represents a set of values with size n, with mean μ and with standard
deviation S. The comparison of the observed mean (μ) of the population to the
theoretical value μ is performed with the formula below

x − μ0
t=
s/√n
To evaluate whether the difference is statistically significant, you first have to read in
t test table the critical value of students t distribution corresponding to the
significance level alpha of your choice (5%). The degrees of freedom (df) used in this
test are df=n-1

Problem: A professor wants to know if her introductory statistics class has a good
grasp of basic math. Six students are chosen at random from the class and
given a math proficiency test. The professor wants the class to be able to
score above 70 on the test. The six students get scores of 62,92,75,68,83 and
95. can the professor have 90 percent confidence that the mean score for the
class on the test would be above 70?
Solution:
Null Hypothesis H0: μ=70
Alternative Hypothesis Ha: μ>70
First, compute the sample mean and standard deviation
62+92+75+68+83+95
�= 6

476
= 6
= 79.71 and standard deviation=13.17

 Null Hypothesis H0: The sample meet upto standard I.e., μ>70 hours
 Alternative Hypothesis Ha : μ is not greater than 70,
 Level of significance : α = 0.05
x−μ
The test statistic is t = s/√n0

79.71−70
t= 13.17/√6

9.17
t = 5.38

= 1.71 (calculate value of t)

To test the hypothesis, the computed t-value of 1.71 will be compared to the
critical value in the t-table with 5 df is 1.67, the calculate of t is more than
table value of t, so null hypothesis is rejected

R code:
> t.test(x,alternative=”two.sided”,mu=70)
Problem: A sample of 26 bulbs gives a mean life of 990 hours with S.D of 20 hours.
The manufacturer claims that the mean life of bulbs is 1000 hours. Is sample
meet up to the standard.
Solution:
Here n=26
Sample mean � = 990 hours
S.D s=20 hours
Population mean μ=1000 hours
Df=n-1
= 26-1
=25

 Null Hypothesis H0: The sample meet up to standard I.e., μ=1000 hours
 Alternative Hypothesis Ha: μ not equal to 1000
 Level of significance α =0.05
 The test statistic is

x−μ0
t=
s/√n
79.71−70
t=
13.17/√6

=2.5(calculate value of t)
Table values of t with 25 df is 1.708
The calculate value of t is more than table value of t, so null hypothesis is
rejected at 5% level

Paired Comparison:(paired t-test)

Sometimes data comes from non-independent samples. An example might
be testing “before and after” of cosmetics or consumer products. We could use a
single random sample and do “before and after” tests on each person. A hypothesis
test based on these data would be called a paired comparisons test Since the
observations come in pairs, we can study the difference d between the samples. The
difference between each pair of measurements is called di.

Test statistic: With a population of n pairs of measurements, forming a simple

random sample from a normally distributed population, the mean of the difference
�, is tested using the following implementation of t.

d−μ
t=
s/√n

Problem: The blood pressure of 5 women before and after intake of a certain drug
are given below: Test whether there is significant change in blood pressure at
1% level of significance
Before 110 120 125 132 125
After 120 118 125 136 121
Solution: Let μ be the mean of population of differences
 Null Hypothesis H0: μ1=μ2 I.e., no change in BP
 Alternative Hypothesis Ha: μ1 ≠ μ2 I.e., no change in BP
 Level of significance α=0.01
 Computation: Differences di’s (before and after drug) are -10,2,0,-4,4

−10 + 2 + 0 + ( − 4) + 4
�=
5

=-8/5
=-1.6

1 �
S2=�−1 �=1
(�� − �)2

1 5
=4 �=1
(�� − �)2

1
=4 [( − 10 + 1.6)2+(2 + 1.6)2 + (0 + 1.6)2 + ( − 4 + 1.6)2 + (4 + 1.6)2 ]
123.20
= 4
=30.8

S= 30.8 =5.55

 Test statistic: The test statistic is t which is calculated as

d−μ
t=
s/√n

−1.16
=
5.55/ 5

=0.645
Calculated |t| value is 0.645
Tabulates t0.01 with 5-1=4 degrees of freedom is 3.747
Since calculated t<t0.01, we accept the NULL hypothesis and conclude that there
is no significant change in blood pressure

R code:
>x<-c(110,120,125,132,125)
>y<-c(120,118,125,136,121)
>t.test(x,y,paired=TRUE)

T-test for difference of two population means:

With a two-sample t-test, we compare the population means to each other and
again look at the difference. We expect that � − � would be close to μ1−μ2 . The
test statistics will use both sample means, sample standard deviations, and sample
sizes for the test. A two sample t-test follows.
 Write the null and alternative hypothesis
 State the level of significance and find the critical value. The critical value,
from the students t-distribution, has the lesser of n1-1 degrees of freedom
 Compute the test statistic
 Compare the test statistic to the critical value and state a conclusion

x−y
t=
1 1
s +
n1 n2

~ tn1+n2-2

Where
�1 �21 +�2 �22
S2= �1 +�2 −2
Or
(��−�)2 + (�� −�)2
S2= �1+�2−2
Problem: Two horses A and B were tested according to the time (in seconds) to
run a particular track with the following results

Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 29
Test whether the two horses have the same running capacity

Solution:
Given n1=7 and n2=6
We first compute the same means and standard deviations.
�=Mean of the first sample
1
= (28 + 30 + 32 + 33 + 33 + 29 + 34)
7
1
=7(219)
=31.286

�=Mean of the second sample

1
=6 (29 + 30 + 30 + 24 + 27 + 29)
1
=6 (169)
=28.16

x X-� (X-�)2 y y-� (y-�)2

28 -3.286 10.8 29 0.84 0.7056
30 -1.286 1.6538 30 1.84 3.3856
32 0.714 0.51 30 1.84 3.3856
33 1.714 2.94 24 -4.16 17.3056
33 1.714 2.94 27 -1.16 1.3456
29 -2.286 5.226 29 0.84 0.7056
34 2.714 7.366
Total=219 31.4359 169 26.8336

(�� −�)2 + (��−�)2

Now: S2= �1 +�2 −2

31.4359+26.8336
= 7+6−2

=5.26

Therefore: S= 5.26
=2.3

 Null Hypothesis H0: μ1=μ2

 Alternative Hypothesis Ha: μ1 ≠ μ2
 Level of significance α=0.05
 Computation:
x−y
t=
1 1
s +
n1 n2

31.86 − 28.16
t=
1 1
(2.3) +
7 6

=2.443

Tabulates t0.05 with 7+6-2=11 degrees of freedom at 5% level of significance is

2.2
Since calculated t>t0.05, we reject the Null hypothesis and conclude that there
is no significant change in blood pressure.

ANOVA:(Analysis of variance)
When we have only two samples we can use the t-test to compare the means
of the samples but it might become unreliable in case of more than two samples. If
we only compare two means, then the t-test (independent samples) will give the
same results as the ANOVA. Anova is performed with F-test Null Hypothesis H0.
There are no differences among the mean values of the groups being compared (I.e.,
the group means are all equal)
H0:�1 = �2 = �3 = . . . = ��
Alternative Hypothesis H1: (conclusion if H0 rejected)?
Not all group means are equal(I.e., at least one group mean is different from the rest)

ANOVA one-way classification:

Step 1: Total number of all observations
T= � � ��
Step 2: correlation factor
�2 �2
Cf= � = �×�
Step 3: Total sum of squares
TSS=�2 �= � � �2�� − ��
Step 4: Treatment sum of squares
��2
TrSS=�2 �r= � − ��
Step 5: Error sum of squares
ESS=�2 � = �� − ��
Source of df Sum of squares TSS F-test
variable
Treatment(b/w K-1 �2 �r=
��2
− �� 2�r=
��2 �2�r
Tcal= �2�
sample) � �−1

Error n-K 2
� � = �� − �� 2 �
�2�=�−�

BASIC STATISTICS:
Average a number expressing the central or typical value in a set of data, in
particular the mean,mode,median or(most commonly) the mean, which is calculated
by dividing the sum of the values in the set by their number. The basic formula for
the average of n numbers x1,x2,…xn is

A=(x1+x2+…+xn)/n

Ex:
Suppose there are 8 data points 2,4,4,4,5,5,7,9. The average of these 8 data points is,
A=2+4+4+4+5+5+7+9/8
=5

R code:
List=c(2,4,4,4,5,5,7,9)
Print(mean(list))

Variance in R programming language:

Variance is the sum of squares of differences between all numbers and
means. The mathematical formula for variance is as follows
�
2 �=1
= 1(�� − �)2
� =
�
Where,
� is mean
N is the total number of elements or frequency of distribution

Example: Let’s consider the same dataset that we have taken in average. First,
calculate the deviations of each data point from the mean, and square the result of
each.
(2-5)2=(-3)2=9
(4-5)2=(-1)2=1
(4-5)2=(-1)2=1
(4-5)2=(-1)2=1
(5-5)2=02=0
(5-5)2=02=0
(7-5)2=22=4
(9-5)2=42=16

9+1+1+1+0+0+4+16
Variance= 8
=4
Computing variance in R programming
Syntax: var(x)

R code:
>list=c(2,4,4,4,5,5,7,9)
>print(var(list))

Standard deviation in R programming language:

Standard deviation is the square root of variance. It is a measure of the
extent to which data varies from the mean. The mathematical formula for calculating
standard deviation is as follows.
Standard deviation= ��

Example: standard deviation for the data 2,4,4,4,5,5,7,9

Standard deviation= 4
=2

R code:
>list=c(2,4,4,4,5,5,7,9)
>sd(list)

Mean:
It is the sum of observations divided by the total number of observations. It is
also defined as average which is the sum divided by count
�
Mean(�) = �
R code:
>x<-c(2,4,4,4,5,5,7,9)
>mean(x)

Median:
It is the middle value of the data set. It splits the data into two halves. If the
number of elements in the data set is odd then the center element is median and if it
is even then the median value would be the average of two central elements.
Odd Even
�
N+1/2 n/2, 2 +1

R code:
>x<-c(2,4,4,4,5,5,7,9)
>median(x)

Mode:
It is the value that has the highest frequency in the given data set. The data
set may have no mode if the frequency of all data points is the same. Also, we can
have more than one mode if we encounter two or more data points having the same
frequency. There is no inbuilt function for finding the mode in R, so we can create
our own function for finding the mode or we can use the package called modest.
R code:
>mode<-function(v){
>uniqv<-unique(v)
>uniqv[which.max(table(match(v,uniqv)))]
>}
>v<-c(2,4,4,4,5,5,7,9)
>result<-mode(v)
>print(result)

Chapter 9
No ratings yet
Chapter 9
37 pages
Chapter 4 QM (PC)
No ratings yet
Chapter 4 QM (PC)
26 pages
Lecture 3-Discrete Random Variables
No ratings yet
Lecture 3-Discrete Random Variables
46 pages
Chapter 4: Probability Distributions: 4.1 Random Variables
100% (2)
Chapter 4: Probability Distributions: 4.1 Random Variables
53 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
Acts 372 Unit 2
No ratings yet
Acts 372 Unit 2
47 pages
UNIT 4 - Part B
No ratings yet
UNIT 4 - Part B
15 pages
Ch3 Random Variables
No ratings yet
Ch3 Random Variables
27 pages
Common Probability
No ratings yet
Common Probability
47 pages
STA416 - Topic 4 - 3
No ratings yet
STA416 - Topic 4 - 3
40 pages
Unit-II-Probability-binomia Distribution-Poisson Distribution-Normal distribution-NOTES
No ratings yet
Unit-II-Probability-binomia Distribution-Poisson Distribution-Normal distribution-NOTES
50 pages
Block 3
No ratings yet
Block 3
63 pages
Unit 5
No ratings yet
Unit 5
15 pages
5221 Basic Probability Distributions in R MCA MMS 20MCA2CC9
No ratings yet
5221 Basic Probability Distributions in R MCA MMS 20MCA2CC9
30 pages
Sec 4.4 Part 1 - Binomial Distribution
No ratings yet
Sec 4.4 Part 1 - Binomial Distribution
28 pages
Some Discrete Probability Distributions STA 211
No ratings yet
Some Discrete Probability Distributions STA 211
19 pages
Chapter 05
No ratings yet
Chapter 05
30 pages
Umbilical Catheterization
100% (2)
Umbilical Catheterization
30 pages
Speci: Specific Through
No ratings yet
Speci: Specific Through
15 pages
3.discreteProbDist Lec18
No ratings yet
3.discreteProbDist Lec18
33 pages
Section N Notes With Answers
No ratings yet
Section N Notes With Answers
4 pages
Sta 111 Lecture Note 2
No ratings yet
Sta 111 Lecture Note 2
19 pages
Unec 1736779412
No ratings yet
Unec 1736779412
29 pages
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
No ratings yet
Statistical Analysis: Dr. Shahid Iqbal Fall 2021
65 pages
Comm 214 Chapter 6 - Part 1 - Discrete Probability Distributions
No ratings yet
Comm 214 Chapter 6 - Part 1 - Discrete Probability Distributions
38 pages
Chapter 4
No ratings yet
Chapter 4
21 pages
Ist 214-Statictics Ii: Week 4: Binomial Distribution and Poison Distribution, Expected Values and Variance
No ratings yet
Ist 214-Statictics Ii: Week 4: Binomial Distribution and Poison Distribution, Expected Values and Variance
18 pages
Add Math Probability Distribution
No ratings yet
Add Math Probability Distribution
10 pages
FOW9 - SB - Note Chapter 6&7
No ratings yet
FOW9 - SB - Note Chapter 6&7
13 pages
Unit 3 Probability Distributions - 21MA41
No ratings yet
Unit 3 Probability Distributions - 21MA41
17 pages
Probability Distributions in R
No ratings yet
Probability Distributions in R
42 pages
Chap 3
No ratings yet
Chap 3
18 pages
00 Lab Notes
No ratings yet
00 Lab Notes
13 pages
BS Chapter4 2021 Discrete Probability Distribution Binomial Hyper Poisen 22
No ratings yet
BS Chapter4 2021 Discrete Probability Distribution Binomial Hyper Poisen 22
40 pages
Noteschapter 4
No ratings yet
Noteschapter 4
11 pages
MetNum1 2023 1 Week 12
No ratings yet
MetNum1 2023 1 Week 12
61 pages
Binomial Distribution
No ratings yet
Binomial Distribution
36 pages
Discrete Distributions
No ratings yet
Discrete Distributions
25 pages
Special Probability Distributions: Presented By: Juanito S. Chan
No ratings yet
Special Probability Distributions: Presented By: Juanito S. Chan
37 pages
R-Prog Unit-5
No ratings yet
R-Prog Unit-5
23 pages
Some Special Discrete Probability Distributions
0% (1)
Some Special Discrete Probability Distributions
20 pages
Probability Problem
No ratings yet
Probability Problem
36 pages
Chapter 7 Eng
No ratings yet
Chapter 7 Eng
59 pages
Chapter 6
No ratings yet
Chapter 6
50 pages
Binomial and Multinomial Distribution
No ratings yet
Binomial and Multinomial Distribution
5 pages
LMS Content IVth Sem Module 3 PDF
No ratings yet
LMS Content IVth Sem Module 3 PDF
16 pages
Statistics Using R Tutorial
No ratings yet
Statistics Using R Tutorial
22 pages
BPT-Probability-binomia Distribution, Poisson Distribution, Normal Distribution and Chi Square Test
No ratings yet
BPT-Probability-binomia Distribution, Poisson Distribution, Normal Distribution and Chi Square Test
41 pages
S5 Math (The Binomial Distribution)
No ratings yet
S5 Math (The Binomial Distribution)
5 pages
Computer Concepts and Management Information Syste... - (CONTENTS)
100% (1)
Computer Concepts and Management Information Syste... - (CONTENTS)
4 pages
TF3001 Sm2 09-10 Course Notes 7
No ratings yet
TF3001 Sm2 09-10 Course Notes 7
5 pages
Probability Distributions
No ratings yet
Probability Distributions
14 pages
Topic 5 Discrete Distributions
No ratings yet
Topic 5 Discrete Distributions
30 pages
Discrete Probability Distributions
No ratings yet
Discrete Probability Distributions
53 pages
4.08 The Binomial Distribution: 4 Probability Distributions
No ratings yet
4.08 The Binomial Distribution: 4 Probability Distributions
2 pages
1743 Chapter 4 Probability Distribution
No ratings yet
1743 Chapter 4 Probability Distribution
23 pages
Can Daily Reading of English Newspaper Articles Enhance Students Vocabulary
No ratings yet
Can Daily Reading of English Newspaper Articles Enhance Students Vocabulary
19 pages
ADMS 2320 Test 1 Sheet
No ratings yet
ADMS 2320 Test 1 Sheet
1 page
Chap 1 Module
No ratings yet
Chap 1 Module
31 pages
SAT Geometry - Results
No ratings yet
SAT Geometry - Results
48 pages
KLS Gogte Pooled
No ratings yet
KLS Gogte Pooled
48 pages
Campaign Planning&execution (UNIT-1) DM
No ratings yet
Campaign Planning&execution (UNIT-1) DM
6 pages
Bab 8 Probablity Distribution
No ratings yet
Bab 8 Probablity Distribution
10 pages
Lab-2: Probability Distributions Name: Objective:To Compute Probability Density Function (PDF) and Cumulative Distribution Function (CDF) Outcomes
No ratings yet
Lab-2: Probability Distributions Name: Objective:To Compute Probability Density Function (PDF) and Cumulative Distribution Function (CDF) Outcomes
15 pages
ISC Poems
100% (1)
ISC Poems
17 pages
Uy - Charles - LAB1 - Setup A Multi-VM Environment
No ratings yet
Uy - Charles - LAB1 - Setup A Multi-VM Environment
15 pages
Daa U-5
No ratings yet
Daa U-5
29 pages
Buy The Truth and Sell It Not
No ratings yet
Buy The Truth and Sell It Not
102 pages
Introduction To Computers (UCC 002)
No ratings yet
Introduction To Computers (UCC 002)
9 pages
Module 5 MCQ of Cyber Security
No ratings yet
Module 5 MCQ of Cyber Security
14 pages
Module 4 MCQ of Cyber Security
No ratings yet
Module 4 MCQ of Cyber Security
15 pages
Chat
No ratings yet
Chat
16 pages
Daa U-3
No ratings yet
Daa U-3
9 pages
Warriors Super Edition
No ratings yet
Warriors Super Edition
36 pages
Mark Scheme (Results) Summer 2016: Pearson Edexcel International Lower Secondary Curriculum in English Year 9 (LEH01)
100% (1)
Mark Scheme (Results) Summer 2016: Pearson Edexcel International Lower Secondary Curriculum in English Year 9 (LEH01)
19 pages
Unitwise Definitions
No ratings yet
Unitwise Definitions
6 pages
RP Question Bank
No ratings yet
RP Question Bank
3 pages
DR Viossy - Moonlight Sonata 3Rd Movement JW
No ratings yet
DR Viossy - Moonlight Sonata 3Rd Movement JW
16 pages
Alphabet Minibooks
No ratings yet
Alphabet Minibooks
26 pages
Communication in The Workplace
No ratings yet
Communication in The Workplace
60 pages
Lab CPP 08 Inheritance
No ratings yet
Lab CPP 08 Inheritance
31 pages
API Concepts (V5R2)
No ratings yet
API Concepts (V5R2)
25 pages
English Diary Entry-1 (PR) GR 5
No ratings yet
English Diary Entry-1 (PR) GR 5
2 pages
Super 25 DTM Questions V2V
No ratings yet
Super 25 DTM Questions V2V
3 pages
Questions Communication
100% (1)
Questions Communication
2 pages
PDF 8ºs Anos Ficha de Gramática Compreensão de Texto e Escrita Criativa
No ratings yet
PDF 8ºs Anos Ficha de Gramática Compreensão de Texto e Escrita Criativa
11 pages
Test
No ratings yet
Test
56 pages
Notes On Microsoft Office XP For BCA-104: Windows
No ratings yet
Notes On Microsoft Office XP For BCA-104: Windows
48 pages
English - File - Intermediate - 3e - Student - 39 - S - Book-114-122 Writing PDF
No ratings yet
English - File - Intermediate - 3e - Student - 39 - S - Book-114-122 Writing PDF
9 pages
Flipkart Web Scrapping Project
No ratings yet
Flipkart Web Scrapping Project
11 pages
Strategy For The TOEFL: By: Fitriana Yuli P (133221161/3E)
No ratings yet
Strategy For The TOEFL: By: Fitriana Yuli P (133221161/3E)
12 pages
4TH QUARTER SUMMATIVE TEST Non Fiction
No ratings yet
4TH QUARTER SUMMATIVE TEST Non Fiction
2 pages
INVITATION
No ratings yet
INVITATION
5 pages
Existence and Continuous Dependence of Solutions of A Neutral Functional-Differential Equation
No ratings yet
Existence and Continuous Dependence of Solutions of A Neutral Functional-Differential Equation
18 pages
Readme
No ratings yet
Readme
3 pages
1st Year MOOCS
No ratings yet
1st Year MOOCS
1 page
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

R Unit 4

Uploaded by

R Unit 4

Uploaded by

UNIT 4

n! : This starts the count of number of ways event can occur

R has four in-built functions to generate binomial distribution. They are:

Following is the description of the parameters used

Solution: The number of trials: 5

Solution: The probability of Chinese food 70% = 0.7

P(x=2) = 3c2 *(0.3)2 * (0.7)1

P(x=3)=3c3 * (0.3)3 * (0.7)0

R code: dbinom(2, size=3, prob=0.3) + dbinom(3, size=3, prob=0.3)

Problem: What is the probability of obtaining 45 or fewer heads in 100 tosses of a

Solution: To solve this problem, we compute 45 individual probabilities using

R code: pbinom(45, size=100, prob=0.5)

To find the probability of having exactly 4 correct answers by random

To find the probability of having four or less correct answers by random

Alternatively, we can use the cumulative probability function for binomial

Where: p(x): probability of x given λ

Mean standard deviation

Following is the description of the parameter used:

ppois(16,lambda=12) #lower tail

Cumulative Poisson distribution:

Solution: This is Poisson experiment in which we know the following:

Standard normal curve:

Procedure to find probability using positive Z score table

Case 1 Area between 0 and any Area(z)

Case 3 Area between two Z |Area(z2)-Area(z1)|

Case 4 Area between two Z |Area(z1)+Area(z2)|

Case 6 Area to the right of a 0.5-Area(z)

(b) For x=21, then

(c) For x=30, then

Problem: The length of life of an instrument produced by a machine has a normal

(b) Between 7 and 12 months

For x=7, then

Hence P(7<x<12) = P(-2.5<x<0)

This will always be a number between -1 and 1(inclusive)

Formula for correlation coefficient:

Covariance in R programming language:

T Test for single mean:

= 1.71 (calculate value of t)

Paired Comparison:(paired t-test)

Test statistic: With a population of n pairs of measurements, forming a simple

 Test statistic: The test statistic is t which is calculated as

T-test for difference of two population means:

�=Mean of the second sample

x X-� (X-�)2 y y-� (y-�)2

(�� −�)2 + (��−�)2

 Null Hypothesis H0: μ1=μ2

Tabulates t0.05 with 7+6-2=11 degrees of freedom at 5% level of significance is

ANOVA one-way classification:

Variance in R programming language:

Standard deviation in R programming language:

Example: standard deviation for the data 2,4,4,4,5,5,7,9

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

P(x=2) = 3c2 (0.3)2 (0.7)1