Note - Unit-5

Department of Information Technology, Vardhaman College of Engineering, Hyderabad.
Web:(www.vardhaman.org)
Unit-5
Course contents:
• Multiple Testing :Introduction
• Hypothesis testing, Tests for Proportions
• One Sample Tests for Means and Variances
• Two-Sample Tests for Means and Variances
• Other Hypothesis Tests
• Analysis of Variance
• Sample Size and Power
Faculty Name: Dr.Saroja Kumar Rout
Department: Information Technology
Course Name: Predictive Analytics
Course Code:A6608
SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

Multiple Testing :Introduction
Introduction to Multiple Testing

Multiple testing refers to the situation where statistical tests are conducted simultaneously on a
dataset, potentially leading to an increased risk of obtaining false-positive results. This phenomenon
arises in various scientific and research contexts, particularly in fields such as biostatistics, genomics,
and social sciences, where multiple hypotheses are tested concurrently.
Key Concepts:
1. Familywise Error Rate (FWER):
• The probability of making at least one Type I error (false positive) in a set of tests is
known as the familywise error rate.
• Common methods to control FWER include the Bonferroni correction and the Šidák
correction.
2. False Discovery Rate (FDR):
• FDR is the expected proportion of false positives among all rejected null hypotheses.
• Controlling FDR is often considered less conservative than controlling FWER and is
suitable when a balance between discovery and control is needed.
3. Types of Errors:
• Type I Error (False Positive): Incorrectly rejecting a true null hypothesis.
• Type II Error (False Negative): Failing to reject a false null hypothesis.
4. Common Procedures for Multiple Testing:
• Bonferroni Correction: Adjusts the significance level by dividing it by the number of

tests being conducted.
• Holm's Method: Similar to Bonferroni but potentially more powerful.
• False Discovery Rate (FDR) Control: Methods like the Benjamini-Hochberg

procedure focus on controlling the proportion of false discoveries.
Challenges:
1. Increased Type I Error Rate:
• Conducting multiple tests without appropriate adjustments can inflate the likelihood
of obtaining statistically significant results purely by chance.
2. Complexity with Dependent Tests:

• When tests are not independent, adjusting for multiple testing becomes more
intricate. Methods like the False Discovery Rate can be more suitable in such
scenarios.
3. Balancing Discovery and Control:
• There is often a trade-off between controlling the risk of false discoveries and having
enough power to detect true effects. Striking the right balance is essential.
Applications:
1. Genomics:
• Analyzing multiple genes or genetic markers simultaneously.
• Genome-wide association studies (GWAS) involve testing associations between

genetic variants and traits.
2. Clinical Trials:
• Testing the efficacy of multiple treatments or interventions.
• Monitoring multiple endpoints in a clinical trial.
3. Economics and Social Sciences:
• Testing multiple hypotheses in economic studies, social experiments, or surveys.
4. Quality Control:
• Testing multiple characteristics of a product or process simultaneously.
Understanding and appropriately addressing the challenges of multiple testing are crucial for
ensuring the reliability and validity of statistical inferences in various research domains. Choosing the
right correction method depends on the specific goals of the analysis and the nature of the data
being investigated.
Hypothesis testing


Explain Hypothesis Testing Z Test

Explain Hypothesis Testing: t-Test

Hypothesis Testing: t-Test
The t-test is a statistical method used to determine if there is a significant difference between the
means of two groups. It is particularly useful when dealing with small sample sizes. There are
different variations of the t-test, including the independent samples t-test and the paired samples t-
test.


Explain Chi-Square Test

The Chi-Square Test is a statistical test used to determine if there is a significant association between
two categorical variables. It's often used when the data consists of counts or frequencies within
different categories. There are different variants of the Chi-Square Test, including the Chi-Square Test
for Independence and the Chi-Square Test for Goodness of Fit.
1. Chi-Square Test for Independence:
Scenario:
• Used when you have two categorical variables from a single population, and you want to
determine if there is a significant association between them.



3. One Sample Tests for Means and Variances


4. Two-Sample Tests for Means and Variances


5. Analysis of Variance (ANOVA):

Analysis of Variance (ANOVA):
Definition:
Analysis of Variance (ANOVA) is a statistical technique used to analyze the differences among group
means in a sample. It assesses whether the variability within groups is similar to the variability
between groups.

Key Concepts:
Objective:
ANOVA is used to determine if there are any statistically significant differences between the means of
three or more independent (unrelated) groups.
Variability:
ANOVA decomposes the total variability observed in a set of data into two components: variability
between groups and variability within groups.


Example-1
6. Sample Size and Power

Sample Size:
Sample size refers to the number of individual observations or data points collected in
a particular study. It is a critical aspect of experimental design and statistical analysis. The size
of the sample can significantly influence the reliability and validity of study findings. In general,
a larger sample size tends to provide more accurate and representative results.
Key Considerations for Determining Sample Size:
1. Statistical Power: The ability of a study to detect a true effect when it exists is known
as statistical power. Increasing the sample size generally increases statistical power,
allowing for a better chance of identifying significant effects.
2. Effect Size: The magnitude of the difference or relationship being studied is known as
the effect size. Larger effects can be detected with smaller sample sizes, while smaller
effects may require larger samples to be detected.
3. Desired Level of Confidence: The level of confidence or significance (often denoted by
α) is the probability of rejecting a null hypothesis that is actually true. Common levels
include 0.05 or 0.01.

4. Variability in the Population: The amount of variability or dispersion in the population

being studied affects the precision of estimates. Higher variability may require larger
sample sizes.
5. Type I and Type II Error Rates: The acceptable levels of Type I error (false positive) and
Type II error (false negative) influence the sample size calculation.
6. Research Design: The specific research design, including the type of statistical test and
analysis, can impact the required sample size.
Power:
Power is the probability that a statistical test will correctly reject a false null hypothesis
(i.e., the probability of not making a Type II error). In other words, it is the ability of a study to
detect a true effect or relationship if it exists. Power is influenced by several factors, including
sample size, effect size, and the chosen significance level.
Factors Affecting Statistical Power:
1. Sample Size: Increasing the sample size generally increases power.
2. Effect Size: A larger effect size results in higher power.
3. Significance Level (α): Lowering the significance level increases power but also
increases the risk of Type II errors.
4. Variability: Lower variability in the population increases power.
Calculating Power:
Power is often calculated during the planning phase of a study using statistical power
analysis. This analysis involves determining the minimum sample size required to achieve a
certain level of power for detecting a specified effect size at a chosen significance level.
Conclusion:
Both sample size and power are crucial considerations in the design and interpretation
of research studies. Determining an appropriate sample size and ensuring adequate power
enhance the reliability and validity of study results. Researchers aim to strike a balance
between practical considerations (such as time and resources) and the need for meaningful
and generalizable findings.

Note - Unit-5

Uploaded by

Copyright:

Available Formats

Note - Unit-5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Note - Unit-5

Uploaded by

Copyright:

Available Formats

Department of Information Technology, Vardhaman College of Engineering, Hyderabad.

• Hypothesis testing, Tests for Proportions

• One Sample Tests for Means and Variances

• Two-Sample Tests for Means and Variances

• Other Hypothesis Tests

• Sample Size and Power

Faculty Name: Dr.Saroja Kumar Rout

Department: Information Technology

Course Name: Predictive Analytics

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

Multiple Testing :Introduction

Introduction to Multiple Testing

1. Familywise Error Rate (FWER):

2. False Discovery Rate (FDR):

• Type I Error (False Positive): Incorrectly rejecting a true null hypothesis.

• Type II Error (False Negative): Failing to reject a false null hypothesis.

4. Common Procedures for Multiple Testing:

• Bonferroni Correction: Adjusts the significance level by dividing it by the number of

• Holm's Method: Similar to Bonferroni but potentially more powerful.

• False Discovery Rate (FDR) Control: Methods like the Benjamini-Hochberg

1. Increased Type I Error Rate:

2. Complexity with Dependent Tests:

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

3. Balancing Discovery and Control:

• Analyzing multiple genes or genetic markers simultaneously.

• Genome-wide association studies (GWAS) involve testing associations between

• Testing the efficacy of multiple treatments or interventions.

• Monitoring multiple endpoints in a clinical trial.

3. Economics and Social Sciences:

• Testing multiple hypotheses in economic studies, social experiments, or surveys.

• Testing multiple characteristics of a product or process simultaneously.

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

Explain Hypothesis Testing Z Test

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

Explain Hypothesis Testing: t-Test

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

Explain Chi-Square Test

1. Chi-Square Test for Independence:

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

3. One Sample Tests for Means and Variances

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

4. Two-Sample Tests for Means and Variances

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

5. Analysis of Variance (ANOVA):

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

6. Sample Size and Power

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

4. Variability in the Population: The amount of variability or dispersion in the population

SAROJA/LECTURE NOTE/VMEG/PREDICTIVE ANALYTICS/UNIT-5

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.