Dissertation: Testing OF Hypothesis

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20



Submitted by – Deeksha Chawla

Course – BSc PMS
Year / Sem – 3 / 5
Enrolment No. – 2200104141
University – Integral University

Submitted to – Dr. Quazzafi Rabbani Sir

List of Contents
1. Introduction
i) Sampling
ii) Parameter
iii) Estimation
iv) Parameter
v) Sample Statistic
vi) Parametric data ka introduction Dena h + history include karni hai)
2. Assumptions of parametric tests
3. Statistical hypothesis
i) Null Hypothesis
ii) Alternate Hypothesis
4. Significance
i) power of test
*(proof of power = 1-β)
ii) Level of significance
iii) Single-tail test
iv) Double-tailed test
v) p value
vi) Critical region
vii) Acceptance region
viii) Error i. type 1 ii. Type 2
ix) Degree of freedom
5. Steps involved in hypothesis testing
6. Importance of testing of hypothesis/Utilization of TOH
7. Types of Tests of Significance
i) Student’s t-test
a. Test for single mean
b. difference of means
c. Paired t-test
ii) Z- test – sample size >=30
iii) F – test
iv) Chi-Square Test
a. to test goodness of fit
b. to test the independence of attributes
a. 1 way
b. 2 way
c. repeated measures ANOVA
v. Linear Regression for modelling relationships between variables
Most powerful test and uniform powerful test
-------------Challenges and limitations of hypothesis testing
--------------how to improve hypothesis testing skills
Neyman J. and Pearson, E.S. Lemma Theorem

Unbiased Test and Unbiased Critical Region + Theorem

8. Drawbacks
In today’s data-driven world, decisions are based on data all the time.
Hypothesis plays a crucial role in that process, whether it may be making
business decisions, in the health sector, academia, agriculture, or quality
improvement. Without hypothesis and hypothesis tests, you risk drawing the
wrong conclusions and making bad decisions.

Hypothesis testing is a systematic procedure for deciding whether the results

of a research study support a particular theory.

In this study, we will understand how we use hypothesis testing for various
cases. It will guide us to draw conclusions from any research or theory.
Let us understand a few keywords -

1. Population – A population is a set of data that is gathered for analysis of its

characteristics and behaviour. Population is used to reach a conclusion based
on the evidence and reasoning. For example, the population could be all voters
in a country, all students in a school, or all manufactured products from a
factory. Understanding the population helps in controlling the variability and
bias, leading to more accurate and reliable results.

2. Sampling - Selecting data from the population is known as a sample. The

method or technique used for the selection of data from the population is
referred to as sampling. Sampling can be easily related to an example of
forming a subset from an original set. Since the analysis of the whole
population is impractical or impossible (due to size, cost, or time constraints),
samples are drawn. Sampling is required to estimate a large population's
characteristics or behaviour with a sample's help. There are several techniques
for sampling data which include simple random sampling, systematic sampling,
stratified sampling, cluster sampling, etc.

3. Estimation - The process of drawing inferences about a population using

its sample is known as estimation. It can also be referred to as a way of making
a calculation more manageable. Estimation in statistics is a method to calculate
the value of some property of a population from the observation of a sample
drawn from the same population.
4. Estimator – In statistics, a function that associates the parameter to the
sample statistic is known as an estimator.

5. Parameter - A parameter is a numerical value that explains the

characteristics of a population. A few common parameters used in testing a
hypothesis are mean, variance, standard deviation, proportion, etc. It is a key
characteristic of a population, providing a foundation for analysis, estimation,
and decision-making in research.

6. Sample statistic – Sample statistics are numerical values that explain a

sample's characteristics. These values are computed using sample data.

7. Parametric Tests - Parametric statistics are based on assumptions about

the population distribution from which the sample was taken.

Hypothesis testing is the method of using samples to learn more

about the characteristics of a given population. It involves making
an assumption about a population parameter or distribution and
testing it further.

Hypothesis testing or significance testing is a method for testing a claim or

hypothesis about a parameter in a population, using data measured in a
sample. In this method, we test some hypothesis by determining the likelihood
that a sample statistic could have been selected, if the hypothesis regarding the
population parameter were true.

link - https://www.simplilearn.com/tutorials/statistics-tutorial/hypothesis-testing-in-statistics

When conducting hypothesis testing with parametric data, several assumptions

must be met to ensure the validity of statistical tests. The following are the key
1. Random Sampling
All statistical hypothesis tests assume that the sampling is random.
2. Distribution
It is important for the distribution of the population must be known or the
sample size should be large enough.
3. Normality
The data should be normally distributed, especially for smaller sample sizes.
This means that the distribution of the population from which the samples are
drawn should resemble a bell-shaped curve.
For larger sample sizes (typically n≥30), the Central Limit Theorem suggests
that the sampling distribution of the sample mean will be approximately
normal, even if the data itself is not.
4. Independence
The samples drawn from the populations should be independent of each other.
A statistical hypothesis is some statement or assertion about a population
parameter or about population distribution characterizing a population, which
we want to verify based on information available from the sample. If the
statistical hypothesis specifies the population completely then it is termed as
simple statistical hypothesis otherwise it is called as composite statistical
hypothesis. A statistical hypothesis is of two types namely -
i) Null Hypothesis
ii) Alternate Hypothesis

A statistician or decision-maker should be completely impartial and he/she
should not be allowed to let their personal opinion influence the decision. A
decision-maker should take a neutral attitude towards the outcome of the test.
The Null Hypothesis is a statement of no effect or significant difference
between a population parameter and a sample statistic.
Even if a difference is observed that is merely because of fluctuation in
sampling from the same population. It is a statement about a population
parameter that is assumed to be true.⇒H₀: µ=µ₀
where H₀ is the null hypothesis, µ is the population mean, and
µ₀ is the sample mean.

The important reason for testing the null hypothesis is because we suspect it is
wrong. The acceptance or rejection of the null hypothesis is meaningful to the
decision-maker hence the suspension.
The alternate hypothesis is a statement about the population parameter that
states there is a significant difference between the population parameter and
the sample statistic.
The alternate hypothesis is accepted when the null hypothesis is rejected and
is rejected when the null hypothesis is accepted.
⇒ H₁: µ>µ₁ or µ<µ₁ or µ≠µ₁
where H₁ is the alternate hypothesis, µ is the population mean,
and µ₁ is the sample mean.
Level of significance (α)
When there is a difference in the population parameter and sample statistics a
measure of the strength of the evidence or the probability level is considered
before rejecting the null hypothesis is known as the level of significance. This
level is determined before conducting the experiment. It is also known as alpha
or α. Since the significance level is probability, it ranges from 0
to 1.

Confidence level
For scientific experiments, confidence levels are used to express confidence in
the output of the experiment. It is expressed in percentage. Confidence level
shows the probability of occurrence of the same conclusion if the experiment is
repeated. If the confidence level is 95%, this would mean that if we were to
repeat our experiment 100 times and compute 100 corresponding confidence
intervals, approximately 95 of the confidence intervals would contain the
population mean.
Confidence level = 1 - α

Critical Region
The region of a set of values for a test statistic that leads to the rejection of the
null hypothesis is called the critical region or rejection region.

Acceptance Region
The region of a set of values for a test statistic that leads to the acceptance of
the null hypothesis is called the acceptance region.

*(bell shaped curve with rejection and acceptance region)

The decision to accept or reject the null hypothesis H₀ is made based on
information available from the observation of the sample. The
conclusion hence drawn is necessarily not always true in
respect of the population. This leads to errors in decision-
making. This error can be of two types namely -
Type I Error – The error of rejecting the null hypothesis even if it is true is
known as a Type I error.
The probability of Type I error is denoted by α. Type I error is also known
as significance level.
Type II Error – The error of accepting the null hypothesis even if it is false is
known as Type II error.
The probability of Type II error is denoted by β.

Power of a test
In hypothesis testing, the power of a test refers to avoiding a
type II error i.e. correctly rejecting a null hypothesis that is
power of a test = 1- β
It ranges from 0 to 1 since it is the probability of rejecting a
false null hypothesis. Factors such as significance level, sample
size, etc. affect the power. If the value of the power of a test is
closer to 1, it is said to be a good power.

Note: A statistician or decision-maker should choose a test that

minimizes β or maximizes the power i.e. 1 – β.

Single-Tailed Test
A one-tailed test is a statistical test in which the critical area of a distribution is
one-sided so that it is either greater than or less than a certain value, but not
both. If the sample being tested falls into the one-sided critical area, the
alternative hypothesis will be accepted instead of the null hypothesis. The one-
tailed test gets its name from testing the area under one of the tails of a
normal distribution.
*Graph of a single tail

Two-Tailed Test or Double – Tailed Test

A two-tailed test, in statistics, is a method in which the critical area of a
distribution is two-sided and tests whether a sample is greater than or less
than a certain range of values. If the sample being tested falls into either of the
critical areas, the alternative hypothesis is accepted instead of the null
hypothesis. The two-tailed test gets its name from testing the area under both
tails of a normal distribution.
*Graph of 2 tail
A p-value is the probability of obtaining a sample outcome, given that the value
stated in the null hypothesis is true. It ranges from 0 to 1. Typically, the
probability of obtaining a sample outcome is set at 5% i.e. 95% level of
significance. If the p-value is small, more likely to reject the null hypothesis,
and if the p-value is large, more likely to accept the null hypothesis. A p-value
can be obtained using test distribution tables.

Degree of Freedom (df)

The number of independent items in a data sample data which is randomly
selected from the population is known as the degree of freedom. Usually, the
degree of freedom is one less than the total number of items in the sample
data. The degree of freedom is used to ensure that the sample data is
statistically valid for tests.
df = N – 1

A method of comparing a claim or statement observed regarding the

population with the help of a sample drawn from the same population is
known as the testing of a hypothesis. Statistical tests are used in hypothesis
testing which can be used to determine whether a predictor or estimator
variable has a statistically significant relationship with an outcome.

Steps involved in Testing of Hypothesis-

1. Formulate the null hypothesis

From the gathered sample data, understand the characteristic required to be
observed. Make an observation or state the null hypothesis. The null
hypothesis is set to claim whether there is a significant difference from the
population parameter.

2. Choose the appropriate test

Depending upon the given or collected sample data, wisely choose the test for
further observation and inference. The test can be selected on the basis of the
distribution of the sample data.
The appropriate test should be used for a better study of the design of the
sample data or variation.

3. Calculate the p-value

Perform the statistical test to compute the test statistics and calculate the p-
value. The p-value is calculated using the sampling distribution of test statistics,
the sample data and the type of test being performed.
The method of calculation of the p-value is different for different tests.

4. Make a decision
Once the p-value is formulated, compare the p-value calculated to the
significance level(α) and check if the p-value is greater or smaller or equal to
the significant level value i.e. tabulated value. Analyze if the p-value lies in the
rejection region (or critical region) or acceptance region.
Reject the null hypothesis H₀ if the p-value lies in the rejection region (i.e.
accept alternate hypothesis H₁). Accept the null hypothesis H₀ if the p-value lies
in the acceptance region (i.e. reject the alternate hypothesis H₁).

5. Conclusion
Summarize the result of whether the null hypothesis is accepted or rejected.
According to the decision, a conclusion is drawn if there is a significant
difference between the sample test statistics and the population parameter.

We have already discussed that hypothesis testing is a method used to test

whether the null hypothesis is valid to be accepted or rejected. There are
various fields where testing of hypothesis is used. The following are some

1. Evaluate the strength of a claim

It helps to determine whether the evidence supports a specific claim or theory,
ensuring that conclusions are based on empirical data rather than assumptions.
Hypothesis testing is a way of assessing the strength of a claim or assumption
before using it in a data set.

2. Improve quantitative research / Research Advancement

A formalized hypothesis will force us to think about what results we should
look for in an experiment. Hypothesis testing can help researchers generalize
their results to a larger population, rather than just the sample they studied. It
helps the researcher to identify gaps or limitations in their data or analysis.
Hypothesis testing allows the researchers to determine whether the data from
the sample is statistically significant.

3. Business Analysis
By being more data-driven, one can improve business by enabling them to
identify new opportunities. It helps in optimizing marketing strategies to
ensure product quality. For example, an e-commerce company can use
hypothesis testing to compare sales data from customers who received free
shipping offers and those who didn't.
Hypothesis testing can help businesses reduce the risk of costly mistakes by
basing business decisions on data, not hunches.
In manufacturing, hypothesis testing can help ensure that production
processes are within specified limits. In financial analysis, hypothesis testing
can help measure investment performance and identify anomalies in financial
4. Understanding Relationships
Statistical hypothesis helps in understanding relationships between sample
statistics(variables), identifying patterns, and making assumptions based on
statistical evidence. It helps in identifying if there is a significant relationship
between variables and how far the relation extends.

5. Decision Making
With the help of hypothesis testing, one can avoid making errors while decision
making. Hypothesis testing helps analysts and researchers to make informed
decisions based on the evidence.
In statistics, various tests are used to compare different samples or groups and
draw conclusions about populations using the sample drawn. These tests,
known as statistical tests, focus on analysing the likelihood or probability of
obtaining the observed data under specific assumptions or hypotheses. They
provide a framework for assessing evidence in support of or against a particular
There are different statistical tests like -
i) T–test
ii) Z – test
iii) F-Test
iv) Chi-square test
The term "t-statistic" is abbreviated from "hypothesis test statistic".
The t-test is named after William Sealy Gosset’s Student’s t-distribution,
created while he was writing under the pen name “Student.” It was William
Sealy Gosset who first published it in English in 1908 in the scientific journal
Biometrika using the pseudonym “Student”.
Although it was William Gosset after whom the term "Student" is penned, it
was actually through the work of Ronald Fisher that the distribution became
well known as "Student's distribution" and "Student's t-test".

A t-test is a type of inferential statistic test used to determine if there is a

significant difference between the means of two groups or just due to
random variation.

It is often used when data is normally distributed and population variance is

unknown. The t-test is a parametric test, meaning it makes certain assumptions
about the data. (Assumptions are mentioned on page no. --)
In many cases, a Z-test will yield very similar results to a t-test because the
latter converges to the former as the size of the dataset increases.

In the case of T-Test degree of freedom ( ⅆf ) = Σ ns−1, where “n s” is the number

of observations in the sample. The ⅆf reflects the number of values in the
sample that are free to vary after estimating the sample mean.

1. One-Sample t-test
2. Two-sample t-test
3. Paired sample t-test

A one-sample t-test is a statistical hypothesis test that compares the mean of a

population to a known or hypothesized value. It is also known as a single-
sample t-test.

We can use this when the sample size is small (i.e. n < 30) data is collected
randomly and it is approximately normally distributed. It can be calculated as:

t = t-value
x = sample mean
μ= population mean
σ = standard deviation
n = sample size

s= ∑ ( x −x )2
n−1 i=1 i

Result: The null hypothesis is rejected if the calculated t-value is greater than the tabulated t-
value. Therefore, we can conclude that there exists a significant difference. Otherwise (i.e.
when the computed t-value is less than the tabulated t-value) the null hypothesis is

A two-sample t-test, commonly known as an unpaired sample t-test, is used to

determine if the differences between two groups are significant or just a
random occurrence. These tests are also referred to as unpaired or
independent sample t-tests.
We can use this when:
I. The population mean or standard deviation is unknown.
(information about the population is unknown)
II. The two samples are separate/independent.
E.g. boys and girls (the two are independent of each other)
In this case, the degree of freedom ( ⅆf ) = n1 +n 2−2, where n1 and n2 ae sample
sizes for the two groups. This is because there are two parameters to estimate
a two-sample t-test.
The t-value can be calculated using:
( x 1−x 2 )

√ s 1 s 2 where,
2 2

n1 n 2
x 1 ¿ x 2 are the means of the two sample groups.
s1 and s2 are the standard deviations of the two sample groups
n1 and n2 are the sample sizes of the two groups.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy