Statistical_Hypothesis_Testing
Statistical_Hypothesis_Testing
Statistical_Hypothesis_Testing
Hypothesis Testing in R
A statistical hypothesis is an assumption made by the researcher
about the data of the population collected for any experiment.
It is not mandatory for this assumption to be true every time.
Hypothesis testing, in a way, is a formal process of validating the
hypothesis made by the researcher.
In order to validate a hypothesis, it will consider the entire
population into account. However, this is not possible practically.
Thus, to validate a hypothesis, it will use random samples from a
population.
On the basis of the result from testing over the sample data, it
either selects or rejects the hypothesis.
Hypothesis Testing in R
Statisticians use hypothesis testing to formally check whether
the hypothesis is accepted or rejected. Hypothesis testing is
conducted in the following manner:
TRUE-Experimental FALSE-Exp.
Decision Errors in R
The two types of error that can occur from the hypothesis
testing:
#Author DataFlair
x = rnorm(10)
y = rnorm(10)
t.test(x,y)
Output:
The default clause in the t.test() command can be overridden. To
do so, add the var.equal = TRUE. This is an instruction that is
added to the t.test() command. This instruction forces the t.test()
command to assume that the variance of the two samples is
equal.
Output:
As per the samples estimate, the default clause in the t.test()
command can be overridden. To do so, add the var.equal = TRUE
instruction to the standard t.test() command. This instruction
forces the t.test() command to assume that the variance of the
two samples is equal.
One-Sample T-testing in R
To perform analysis, it collects a large amount of data from
various sources and tests it on random samples. In several
situations, when the population of collected data is unknown,
researchers test samples to identify the population. The one-
sample T-test is one of the useful tests for testing the sample’s
population.
This test is used for testing the mean of samples. For example,
you can use this test to compare that a sample of students from a
particular college is identical or different from the sample of
general students. In this situation, the hypothesis tests that the
sample is from a known population with a known mean (m) or
from an unknown population.
Output:
In many cases, you are simply testing to see if the means of two
samples are different, but you may want to know if a sample
mean is lower or greater than another sample mean. You can use
the alternative equal to (=) instruction to switch the emphasis
from a two-sided test (the default) to a one-sided test. The
choices you have are between ″two.sided″, ″less″, or ″greater″,
and the choice can be abbreviated, as shown in the following
command:
#Author DataFlair
t.test(y, mu = 5, alternative = 'greater')
Output:
Formula Syntax and Subsetting
Samples in the T-test in R
As discussed in the previous sections, the T-test is designed to
compare two samples.
If your predictor column contains more than two items, the T-test
cannot be used; however, you can still carry out a test by
subsetting this predictor column and specifying the two samples
you want to compare.
Output:
You first specify the column from which you want to take your
subset and then type %in%. This tells the command that the list
that follows is in the graze column. Note that, you have to put the
levels in quotes; here you compare ″mow″and ″unmow″and your
result is identical to the one you obtained before.
μ-test in R
When you have two samples to compare and your data is
nonparametric, you can use the μ-test. This goes by various
names and may be known as the Mann—Whitney μ-test or
Wilcoxon sign rank test. The wilcox.test() command can carry out
the analysis.
The wilcox.test() command can conduct two-sample or one-
sample tests, and you can add a variety of instructions to carry
out the test.
Output:
By default, the confidence intervals are not calculated and the p-
value is adjusted using the “continuity correction”; a message
tells you that the latter has been used. In this case, you see a
warning message because you have tied values in the data. If you
set exact = FALSE, this message would not be displayed because
the p-value would be determined from a normal approximation
method.
One-Sample μ-test in R
When you specify a single numerical vector, then it carries out a
one-sample μ-test. The default is to set mu = 0. For example:
Output:
You can also use additional instructions as you could with the
other syntax. If the predictor variable contains more than two
samples, you cannot conduct a μ-test and use a subset that
contains exactly two samples.
Output:
This example used the Spearman Rho correlation but you can
also apply Kendall’s tau by specifying method = ″kendall″. Note
that you can abbreviate this but you still need the quotes. You
also have to use lowercase.
If your vectors are within a data frame or some other object, you
need to extract them in a different fashion.
Covariance in R
The cov() command uses syntax similar to the cor() command to
examine covariance.
We can use the cov() command as:
set.seed(5)
x <- rnorm(30, sd=runif(30, 2, 50))
mat <- matrix(x,10)
V <- cov(mat)
V
Output:
The cov2cor() command determines the correlation from a matrix
of covariance, as shown in the following command:
> cov2cor(V)
Output:
Significance Testing in Correlation
Tests
You can apply a significance test to your correlations by using
the cor.test() command. In this case, you can compare only two
vectors at a time, as shown in the following command:
> #DataFlair
> cor.test(women$height, women$weight)
Output:
In the previous example, you can see that the Pearson
correlation is between height and weight in the data of women
and the result also shows the statistical significance of the
correlation.
Output: