0% found this document useful (0 votes)
13 views5 pages

Lab Kamal Sir

The document presents statistical analysis problems including hypothesis testing, ANOVA, multiple comparison tests, permutation tests, and kernel density estimation. It provides R code solutions for calculating Type I and Type II errors, performing ANOVA and multiple comparisons, conducting permutation tests, and estimating density functions using various kernels. The solutions include generating synthetic data, performing statistical tests, and visualizing results.

Uploaded by

jonnjonn972
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views5 pages

Lab Kamal Sir

The document presents statistical analysis problems including hypothesis testing, ANOVA, multiple comparison tests, permutation tests, and kernel density estimation. It provides R code solutions for calculating Type I and Type II errors, performing ANOVA and multiple comparisons, conducting permutation tests, and estimating density functions using various kernels. The solutions include generating synthetic data, performing statistical tests, and visualizing results.

Uploaded by

jonnjonn972
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Problem 1

A random sample of size 10 is available from a normal distribution with a variance of 2.52. A null
hypothesis H0: µ = 50 is to be tested. The hypothesis will be accepted if 48.5 < 𝑋 < 51.5. Obtain the
probabilities of type I and type II errors against the alternative hypothesis H1: µ = 52. Now, arbitrarily
using different accepted regions and sample sizes, show how the two errors and powers of the test
change.

Solution:
# Given data
n <- 10 # Sample size
variance <- 2.52 # Variance
sigma <- sqrt(variance) # Standard deviation
mu_null <- 50 # Null hypothesis mean
mu_alt <- 52 # Alternative hypothesis mean
lower_limit <- 48.5 # Lower bound of acceptance region
upper_limit <- 51.5 # Upper bound of acceptance region

# Calculate the standard error


se <- sigma / sqrt(n)

# Calculate Z-scores for the acceptance region for H0


z_lower <- (lower_limit - mu_null) / se
z_upper <- (upper_limit - mu_null) / se

# Type I Error (α): Probability of rejecting H0 when it is true (H0: mu = 50)


alpha <- pnorm(z_upper) - pnorm(z_lower)

# Type II Error (β): Probability of failing to reject H0 when H1 is true (H1: mu = 52)
z_lower_alt <- (lower_limit - mu_alt) / se
z_upper_alt <- (upper_limit - mu_alt) / se
beta <- pnorm(z_upper_alt) - pnorm(z_lower_alt)

# Power of the test = 1 - β


power <- 1 - beta

# Display the results


cat("Type I Error (α):", alpha, "\n")
cat("Type II Error (β):", beta, "\n")
cat("Power of the test:", power, "\n")

# Now, changing sample sizes and acceptance regions

# Define a function to calculate errors and power based on sample size and acceptance region
calculate_errors_and_power <- function(n, lower_limit, upper_limit, mu_null, mu_alt, sigma) {
se <- sigma / sqrt(n)

# Type I Error (α)


z_lower <- (lower_limit - mu_null) / se
z_upper <- (upper_limit - mu_null) / se
alpha <- pnorm(z_upper) - pnorm(z_lower)

# Type II Error (β)


z_lower_alt <- (lower_limit - mu_alt) / se
z_upper_alt <- (upper_limit - mu_alt) / se
beta <- pnorm(z_upper_alt) - pnorm(z_lower_alt)

# Power of the test


power <- 1 - beta

return(c(alpha = alpha, beta = beta, power = power))


}

# Test for various sample sizes and acceptance regions


sample_sizes <- c(10, 20, 30, 40, 50)
acceptance_regions <- list(c(48.5, 51.5), c(48, 52), c(47.5, 52.5))

# Loop through each sample size and acceptance region


results <- data.frame()
for (size in sample_sizes) {
for (region in acceptance_regions) {
result <- calculate_errors_and_power(size, region[1], region[2], mu_null, mu_alt, sigma)
results <- rbind(results, c(size, region[1], region[2], result))
}
}

# Print the results


colnames(results) <- c("Sample Size", "Lower Limit", "Upper Limit", "Type I Error (α)", "Type II Error (β)", "Power")
print(results)

Problem 02: Using any suitable data, test the overall significance of the treatments. Also, perform
multiple comparison (Tukey, Duncan, Scheffe and Bonferroni) tests to test the pairwise significance of
the treatments using the same data.

Solution :

# Load necessary libraries


Install.packages(“agricolae”)
Install.packages(“multcomp”)sss
library(agricolae) # for Duncan's test and Scheffe's test
library(multcomp) # for Bonferroni and Tukey's tests

# Step 1: Simulate some example data (replace this with your real dataset if you have one)
set.seed(123) # For reproducibility
treatment <- factor(rep(c("A", "B", "C", "D"), each = 10)) # 4 treatments with 10 observations each
value <- c(rnorm(10, mean = 5, sd = 2), # Treatment A
rnorm(10, mean = 7, sd = 2), # Treatment B
rnorm(10, mean = 6, sd = 2), # Treatment C
rnorm(10, mean = 8, sd = 2)) # Treatment D

data <- data.frame(treatment, value)

# Step 2: Perform ANOVA to test overall significance


anova_result <- aov(value ~ treatment, data = data)
summary(anova_result)

# If p-value < 0.05, we reject the null hypothesis and conclude that there is a significant difference
#among treatments

# Step 3: Multiple Comparison Tests if ANOVA is significant

# Tukey's HSD (Honestly Significant Difference) Test


tukey_result <- TukeyHSD(anova_result)
print(tukey_result)

# Duncan's Multiple Range Test


duncan_result <- duncan.test(anova_result, "treatment", group = TRUE)
print(duncan_result)

# Scheffe's Test
scheffe_result <- scheffe.test(anova_result, "treatment")
print(scheffe_result)

# Bonferroni's Test
# We use the glht function from the multcomp package for Bonferroni adjustment
bonferroni_result <- glht(anova_result, linfct = mcp(treatment = "Tukey"))
summary(bonferroni_result)

Problem 03: Perform permutation test or two samples bootstrapping method for any suitable data.

Solution:

# Set seed for reproducibility


set.seed(123)

# Example data: Two samples (Group A and Group B)


group_A <- c(5.2, 6.1, 5.8, 6.5, 6.3)
group_B <- c(7.1, 7.3, 6.8, 7.4, 7.2)
# Combine the two groups into one data vector
combined_data <- c(group_A, group_B)

# Observed difference in means


obs_diff <- mean(group_A) - mean(group_B)

# Number of permutations
n_permutations <- 10000

# Vector to store permutation differences


permutation_diffs <- numeric(n_permutations)

# Perform the permutation test


for (i in 1:n_permutations) {
# Shuffle the combined data and split it back into two groups
permuted_data <- sample(combined_data)
perm_group_A <- permuted_data[1:length(group_A)]
perm_group_B <- permuted_data[(length(group_A) + 1):length(combined_data)]

# Calculate the difference in means for this permutation


permutation_diffs[i] <- mean(perm_group_A) - mean(perm_group_B)
}

# P-value: The proportion of permutations where the absolute difference is greater than or equal to the
observed difference
p_value_perm <- mean(abs(permutation_diffs) >= abs(obs_diff))

# Print the results


cat("Observed Difference in Means:", obs_diff, "\n")
cat("P-value from Permutation Test:", p_value_perm, "\n")

Problem 04: From a suitable data set, estimate the density function 𝑓 by Kernel method using the
kernels such as Epanechnikov, triangle, quartic and Gaussian.

Solution:

# Load necessary libraries

Install.packages(“ggplot2”)

library(ggplot2)
# Step 1: Generate a synthetic dataset (e.g., 1000 observations from a normal distribution)
set.seed(123) # For reproducibility
data <- rnorm(1000, mean = 5, sd = 2)

# Step 2: Estimate the density using different kernels


density_epanechnikov <- density(data, kernel = "epanechnikov")
density_triangle <- density(data, kernel = "triangular")
density_quartic <- density(data, kernel = "rectangular") # "rectangular" is the same as Quartic kernel
density_gaussian <- density(data, kernel = "gaussian")

# Step 3: Plot the density estimates


plot(density_epanechnikov, main = "Kernel Density Estimation", col = "blue", xlim = c(-5, 15),
lwd = 2, xlab = "X", ylab = "Density")
lines(density_triangle, col = "red", lwd = 2)
lines(density_quartic, col = "green", lwd = 2)
lines(density_gaussian, col = "purple", lwd = 2)

# Add a legend
legend("topright", legend = c("Epanechnikov", "Triangle", "Quartic", "Gaussian"),
col = c("blue", "red", "green", "purple"), lwd = 2)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy