Note - Unit-5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Department of Information Technology, Vardhaman College of Engineering, Hyderabad.

Web:(www.vardhaman.org)

Unit-5
Course contents:
• Multiple Testing :Introduction

• Hypothesis testing, Tests for Proportions

• One Sample Tests for Means and Variances

• Two-Sample Tests for Means and Variances

• Other Hypothesis Tests

• Analysis of Variance

• Sample Size and Power

Faculty Name: Dr.Saroja Kumar Rout

Department: Information Technology

Course Name: Predictive Analytics

Course Code:A6608

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

Multiple Testing :Introduction

Introduction to Multiple Testing


Multiple testing refers to the situation where statistical tests are conducted simultaneously on a
dataset, potentially leading to an increased risk of obtaining false-positive results. This phenomenon
arises in various scientific and research contexts, particularly in fields such as biostatistics, genomics,
and social sciences, where multiple hypotheses are tested concurrently.

Key Concepts:

1. Familywise Error Rate (FWER):

• The probability of making at least one Type I error (false positive) in a set of tests is
known as the familywise error rate.

• Common methods to control FWER include the Bonferroni correction and the Šidák
correction.

2. False Discovery Rate (FDR):

• FDR is the expected proportion of false positives among all rejected null hypotheses.

• Controlling FDR is often considered less conservative than controlling FWER and is
suitable when a balance between discovery and control is needed.

3. Types of Errors:

• Type I Error (False Positive): Incorrectly rejecting a true null hypothesis.

• Type II Error (False Negative): Failing to reject a false null hypothesis.

4. Common Procedures for Multiple Testing:

• Bonferroni Correction: Adjusts the significance level by dividing it by the number of


tests being conducted.

• Holm's Method: Similar to Bonferroni but potentially more powerful.

• False Discovery Rate (FDR) Control: Methods like the Benjamini-Hochberg


procedure focus on controlling the proportion of false discoveries.

Challenges:

1. Increased Type I Error Rate:

• Conducting multiple tests without appropriate adjustments can inflate the likelihood
of obtaining statistically significant results purely by chance.

2. Complexity with Dependent Tests:

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

• When tests are not independent, adjusting for multiple testing becomes more
intricate. Methods like the False Discovery Rate can be more suitable in such
scenarios.

3. Balancing Discovery and Control:

• There is often a trade-off between controlling the risk of false discoveries and having
enough power to detect true effects. Striking the right balance is essential.

Applications:

1. Genomics:

• Analyzing multiple genes or genetic markers simultaneously.

• Genome-wide association studies (GWAS) involve testing associations between


genetic variants and traits.

2. Clinical Trials:

• Testing the efficacy of multiple treatments or interventions.

• Monitoring multiple endpoints in a clinical trial.

3. Economics and Social Sciences:

• Testing multiple hypotheses in economic studies, social experiments, or surveys.

4. Quality Control:

• Testing multiple characteristics of a product or process simultaneously.

Understanding and appropriately addressing the challenges of multiple testing are crucial for
ensuring the reliability and validity of statistical inferences in various research domains. Choosing the
right correction method depends on the specific goals of the analysis and the nature of the data
being investigated.

Hypothesis testing

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

Explain Hypothesis Testing Z Test

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

Explain Hypothesis Testing: t-Test


Hypothesis Testing: t-Test

The t-test is a statistical method used to determine if there is a significant difference between the
means of two groups. It is particularly useful when dealing with small sample sizes. There are
different variations of the t-test, including the independent samples t-test and the paired samples t-
test.

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

Explain Chi-Square Test


The Chi-Square Test is a statistical test used to determine if there is a significant association between
two categorical variables. It's often used when the data consists of counts or frequencies within
different categories. There are different variants of the Chi-Square Test, including the Chi-Square Test
for Independence and the Chi-Square Test for Goodness of Fit.

1. Chi-Square Test for Independence:

Scenario:

• Used when you have two categorical variables from a single population, and you want to
determine if there is a significant association between them.

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

3. One Sample Tests for Means and Variances

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

4. Two-Sample Tests for Means and Variances

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

5. Analysis of Variance (ANOVA):


Analysis of Variance (ANOVA):

Definition:

Analysis of Variance (ANOVA) is a statistical technique used to analyze the differences among group
means in a sample. It assesses whether the variability within groups is similar to the variability
between groups.

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

Key Concepts:

Objective:

ANOVA is used to determine if there are any statistically significant differences between the means of
three or more independent (unrelated) groups.

Variability:

ANOVA decomposes the total variability observed in a set of data into two components: variability
between groups and variability within groups.

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

Example-1

6. Sample Size and Power


Sample Size:
Sample size refers to the number of individual observations or data points collected in
a particular study. It is a critical aspect of experimental design and statistical analysis. The size
of the sample can significantly influence the reliability and validity of study findings. In general,
a larger sample size tends to provide more accurate and representative results.
Key Considerations for Determining Sample Size:
1. Statistical Power: The ability of a study to detect a true effect when it exists is known
as statistical power. Increasing the sample size generally increases statistical power,
allowing for a better chance of identifying significant effects.
2. Effect Size: The magnitude of the difference or relationship being studied is known as
the effect size. Larger effects can be detected with smaller sample sizes, while smaller
effects may require larger samples to be detected.
3. Desired Level of Confidence: The level of confidence or significance (often denoted by
α) is the probability of rejecting a null hypothesis that is actually true. Common levels
include 0.05 or 0.01.

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5


Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)

4. Variability in the Population: The amount of variability or dispersion in the population


being studied affects the precision of estimates. Higher variability may require larger
sample sizes.
5. Type I and Type II Error Rates: The acceptable levels of Type I error (false positive) and
Type II error (false negative) influence the sample size calculation.
6. Research Design: The specific research design, including the type of statistical test and
analysis, can impact the required sample size.
Power:
Power is the probability that a statistical test will correctly reject a false null hypothesis
(i.e., the probability of not making a Type II error). In other words, it is the ability of a study to
detect a true effect or relationship if it exists. Power is influenced by several factors, including
sample size, effect size, and the chosen significance level.
Factors Affecting Statistical Power:
1. Sample Size: Increasing the sample size generally increases power.
2. Effect Size: A larger effect size results in higher power.
3. Significance Level (α): Lowering the significance level increases power but also
increases the risk of Type II errors.
4. Variability: Lower variability in the population increases power.
Calculating Power:
Power is often calculated during the planning phase of a study using statistical power
analysis. This analysis involves determining the minimum sample size required to achieve a
certain level of power for detecting a specified effect size at a chosen significance level.
Conclusion:
Both sample size and power are crucial considerations in the design and interpretation
of research studies. Determining an appropriate sample size and ensuring adequate power
enhance the reliability and validity of study results. Researchers aim to strike a balance
between practical considerations (such as time and resources) and the need for meaningful
and generalizable findings.

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy