0% found this document useful (0 votes)
0 views9 pages

Statistical Computing

The document discusses statistical computing methods for estimating the probability Pr(X > 4) using both Monte Carlo integration and importance sampling techniques. It includes R code implementations for calculating these probabilities, comparing their efficiencies, and evaluating their performance through simulations and statistical tests. The results indicate that importance sampling outperforms Monte Carlo integration in terms of convergence and variance when estimating tail probabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views9 pages

Statistical Computing

The document discusses statistical computing methods for estimating the probability Pr(X > 4) using both Monte Carlo integration and importance sampling techniques. It includes R code implementations for calculating these probabilities, comparing their efficiencies, and evaluating their performance through simulations and statistical tests. The results indicate that importance sampling outperforms Monte Carlo integration in terms of convergence and variance when estimating tail probabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Statistical Computing

November 2024

1.(a) Let  2
(
x
2 exp − x4 for x > 0,
f (x) =
0 for x ≤ 0
Then, the probability that x > 4 is given by
Z ∞ Z ∞  2
x x
Pr(x > 4) = f (x) dx = exp − dx
4 4 2 4

Using the R code shown below, we compute this result to get 0.01831564:

weibull<-function(x){
ifelse(x>0, (x/2)*exp(-x^2/4),0)
}
Pr4<-integrate(weibull,lower=4,upper=Inf)
Pr4$value

1.(b) If we denote θ as P r(x > 4),


Z ∞ Z ∞
θ = P (x > 4) = f (x) dx = I(x > 4)f (x) dx = Ef [I(x > 4)]
4 −∞

where I(x > 4) is the indicator function:


(
1 if x > 4,
I(x > 4) =
0 if x ≤ 4.

Using Monte Carlo integration, this integral is approximated by the expectation:

n
1X
θ̂f = I(xi > 4)
n i=1

where {x1 , x2 , . . . , xn } are independent and identically distributed samples drawn from the proba-
bility density function f (x).
Using a sample size of n = 1000, the Monte Carlo estimate for θ̂f was computed to be approximately
0.019. This value represents the proportion of the distribution f (x) that lies in the region x > 4,
as estimated by the randomly drawn samples.

MC <- function(n) {
MCsample <- rweibull(n, shape = 2, scale = 2)
Finprop <- mean(MCsample > 4)
Finprop
}
MC(1000)

1
1.(c) Let g(x) denote the density function of the normal distribution with user-specified values for µ and
σ 2 . Using importance sampling, we can express θ as:
Z ∞ Z ∞
f (x)
θ= I(x > 4) g(x) dx = ϕ(x)g(x) dx = Eg [ϕ(x)],
−∞ g(x) −∞

where
(
f (x)
g(x) , if x > 4,
ϕ(x) =
0, otherwise.

Then the estimator θ̂g for θ is given by:

n
1X
θ̂g = ϕ(xi ),
n i=1

where {x1 , x2 , . . . , xn } are samples drawn independently from the normal distribution with density
function g(x). To estimate θ, one needs to generate a sample of size n from g(x) and compute the
mean of ϕ(x) over the sampled values.

impsamg <- function(n, mean, sd) {


phi <- numeric(n)
for (i in 1:n) {
x <- rnorm(1, mean = mean, sd = sd)
if (x > 4) {
phi[i] <- (x / 2) * exp(-x^2 / 4) / dnorm(x, mean = mean, sd = sd)
} else {
phi[i] <- 0
}
}
mean(phi)
}

1.(d) From the properties of the Monte Carlo estimator, we know that θ̂g is unbiased, and its variance
is given by:

Var(ϕ(x1 ))
Var(θ̂g ) = .
n

To minimise Var(θ̂g ), the function g(x) should be chosen such that ϕ(x) is nearly constant. In
this scenario, g(x) becomes approximately proportional to the integrand ϕ(x)g(x). In other words,
we aim to select g(x) so that its behaviour closely resembles the density of f (x) when x > 4. By
doing so, g(x) will assign higher probability to regions where ϕ(x)g(x) is large, thus improving the
efficiency of the estimator.
Since the focus is on the upper tail of the Weibull distribution density function, the normal distri-
bution g(x) should be centred around values likely to fall in this region. Consequently, a sequence of
mean values (4.0, 4.25, 4.5, 4.75, 5.0) and variance values (1.0, 1.5, 2.0, 2.5) was considered to identify
the optimal parameters that minimise Var(θ̂g ).
From the R code implementation, it was observed that when the mean is set to 4.25 and the
variance to 1.0, the variance Var(θ̂g ) is the smallest among the tested combinations. This indicates
that these parameter for g(x) provide the most efficient importance sampling for this problem.

var_thetahat <- function(mean, variance, n) {


phi <- numeric(n)
for (i in 1:n) {
x <- rnorm(1, mean = mean, sd = sqrt(variance))
if (x > 4) {

2
phi[i] <- dweibull(x,2,2) / dnorm(x, mean = mean, sd=sqrt(variance))
} else {
phi[i] <- 0
}
}
mean_estimate <- mean(phi)
var_estimate <- var(phi) / n
c(mean_estimate, var_estimate)
}
means <- c(4.0, 4.25, 4.5, 4.75, 5.0)
variances <- c(1.0, 1.5, 2.0, 2.5)
n <- 1000
results <- data.frame(Mean = numeric(), Variance = numeric(),
Estimated_Mean = numeric(),
Variance_of_Estimate = numeric())
for (mean in means) {
for (variance in variances) {
result <- var_thetahat(mean, variance, n)
results <- rbind(results, data.frame(Mean = mean,
Variance = variance,
Estimated_Mean = result[1],
Variance_of_Estimate = result[2]))
}
}
best_result <- results[which.min(results$Variance_of_Estimate),
c("Mean", "Variance")]
best_result

1.(e) To compare the two methods for sample sizes n ranging from 1 to 1000, we rewrote the code
to generate a plot illustrating the iteration process for both approaches, using the true value as a
reference. In the case we chose g(x) is the density function of N (4.25, 1). The plot, shown in Figure
1, provides a visual representation of how the estimators converge as the sample size increases.
From Figure 1, it is evident that the importance sampling method outperforms the Monte Carlo
integration when estimating P r(X > 4). The importance sampling approach converges more
quickly to the true value, with significantly lower variance across iterations. This demonstrates
the efficiency of importance sampling, particularly when dealing with tail probabilities such as
P r(X > 4), where the event of interest lies in a region with low probability density.

MC2 <- function(n) {


MCsample<-numeric(n)
Finprop<-numeric(n)
for (i in 1:n){
MCsample[i] <- rweibull(1, shape = 2, scale = 2)
Finprop[i] <- sum(MCsample[1:i] > 4)/i
}
Finprop
}

impsamg2 <- function(n) {


phi <- numeric(n)
e <- numeric(n)
for (i in 1:n) {
x <- rnorm(1, mean = 4.25, sd = 1)
if (x > 4) {
phi[i] <- (x / 2) * exp(-x^2 / 4) / dnorm(x, mean = 4.25, sd = 1)
} else {

3
Comparison of Pr(X > 4) Estimation Methods
0.04
0.03
Probability Estimate

0.02
0.01
0.00

0 200 400 600 800 1000

Iterations

Figure 1: Estimation of P r(X > 4);importance sampling(green line), Monte Carlo integration(blue line)
and actual P r(X > 4) (purple line)

4
phi[i] <- 0
}
e[i] <- mean(phi[1:i])
}
e
}
n <- 1000

mc2_results <- MC2(n)


impsamg2_results <- impsamg2(n)
true_prob <- rep(1 - pweibull(4, shape = 2, scale = 2), n)
plot(mc2_results, xlab = "Iterations", ylab = "Probability Estimate",
main = "Comparison of Pr(X > 4) Estimation Methods",
ylim = c(0, 0.05), type = "l", col = "blue")
lines(impsamg2_results, lty = 2, col = "green")
lines(true_prob, lty = 3, col = "purple")

1.(f) We developed a function to perform 500 iterations of the simulation process for both the Monte
Carlo integration and importance sampling methods, with each simulation using a sample size of
1000. This function allows us to evaluate and compare the performance of the two approaches
under repeated sampling conditions.

MCsim<-function(nsim,n){
MCsimvalue<-numeric(nsim)
for(i in 1:nsim){
MCsample <- rweibull(n, shape = 2, scale = 2)
MCsimvalue[i] <- mean(MCsample > 4)
}
MCsimvalue
}
MCsim_result<-MCsim(500,1000)
MCsim_result

imsamg_sim<-function(nsim,n){
phi_sim<-numeric(nsim)
for (j in 1:nsim){
phi <- numeric(n)
for (i in 1:n) {
x <- rnorm(1, mean = 4.25, sd = 1)
if (x > 4) {
phi[i] <- (x / 2) * exp(-x^2 / 4) / dnorm(x, mean = 4.25, sd = 1)
} else {
phi[i] <- 0
}
phi_sim[j]<-mean(phi)
}
}
phi_sim
}

imsamg_sim_result<-imsamg_sim(500,1000)
imsamg_sim_result

1.(gi) We plotted two histograms of the estimates from the Monte Carlo integration and importance sam-
pling methods, superimposing the corresponding normal distributions with their respective sample
means and variances. To assess the normality of the samples, we applied the Kolmogorov–Smirnov test.

5
Histogram of MCsim_result

80
60
Density

40
20
0

0.000 0.010 0.020 0.030

MCsim_result

Figure 2: Histogram of Monte Carlo Integration Samples

However, for the Monte Carlo integration samples, tied values in the data (multiple occurrences of
the same value) disrupted the smoothness of the empirical CDF, making the Kolmogorov–Smirnov test
less reliable. So, we employed the Lilliefors test, which is an adaptation of the Kolmogorov–Smirnov test
for this scenario.
First, considering the Monte Carlo integration samples, Figure 2 provides an overview of their
distribution, which is asymmetrical and right-skewed. Hypothesis testing was conducted with the
null hypothesis that the samples follow a normal distribution versus the alternative hypothesis that
they do not. Using the Lilliefors test, we obtained a p-value = 5.218 × 10−7 , which is significantly
smaller than 0.05. Thus, we rejected the null hypothesis at the 5% significance level, concluding
that the Monte Carlo integration samples are not normally distributed.
Next, for the importance sampling samples, Figure 3 shows a distribution that visually fits the
theoretical normal distribution. To confirm this, we conducted the Kolmogorov–Smirnov test with
the same hypothesis as above, resulting in a p-value = 0.9648, which is much larger than 0.05.
Therefore, we failed to reject the null hypothesis, concluding that these samples are normally
distributed at the 5% significance level. Additionally, the QQ-plot in Figure 4 further supports this
conclusion by showing a strong agreement with the theoretical quantiles of the normal distribution.
Hence, we confirmed that the importance sampling samples follow a normal distribution with a
mean and variance matching the empirical mean and variance of the data.

MCsim_result<-MCsim(500,1000)
mean_MCsim_result<-mean(MCsim_result)
mean_MCsim_result
variance_MCsim_result<-var(MCsim_result)
variance_MCsim_result
hist(MCsim_result, freq = FALSE, main = "Histogram of MCsim_result",
xlab = "MCsim_result",xlim = c(0,0.035))
curve(dnorm(x, mean = mean_MCsim_result, sd = sqrt(variance_MCsim_result)),
add = TRUE, col = "red", lwd = 2)
library(nortest)
lillie.test(MCsim_result)

imsamg_sim_result<-imsamg_sim(500,1000)

6
Histogram of imsamg_sim_result

100 200 300 400 500


Density

0.016 0.017 0.018 0.019 0.020

imsamg_sim_result

Figure 3: Histogram of Importance Sampling Samples

Normal Q−Q Plot of imsamg_sim_result (nsim = 500)


0.020
Sample Quantiles

0.018
0.016

−3 −2 −1 0 1 2 3

Theoretical Quantiles

Figure 4: Normal Quantile-Quantile Plot of the Importance Sampling Samples

7
mean_imsamg_sim_result<-mean(imsamg_sim_result)
mean_imsamg_sim_result
variance_imsamg_sim_result<-var(imsamg_sim_result)
variance_imsamg_sim_result
hist(imsamg_sim_result, freq = FALSE, main = "Histogram of imsamg_sim_result",
xlab = "imsamg_sim_result")
curve(dnorm(x, mean = mean_imsamg_sim_result,
sd = sqrt(variance_imsamg_sim_result)),
add = TRUE, col = "red", lwd = 2)

ks.test(imsamg_sim_result, "pnorm", mean = mean_imsamg_sim_result,


sd = sqrt(variance_imsamg_sim_result))

qqnorm(imsamg_sim_result,
main = "Normal Q-Q Plot of imsamg_sim_result (nsim = 500)")
qqline(imsamg_sim_result, col = "red", lwd = 2)

1.(gii) First, let us consider Monte Carlo integration. The bias is calculated as the mean of the estimator
samples minus the true value. From the code, we obtain bias = 3.236111×10−5 , which is extremely
small. This indicates that the estimator θ̂f for Monte Carlo integration is unbiased.
Let E(θ̂) = ψ = ψ(θ). The mean squared error (mse) can be expressed as:
 2 
mse(θ̂) = E θ̂ − ψ + ψ − θ
 2   
=E θ̂ − ψ + 2 θ̂ − ψ (ψ − θ) + (ψ − θ)2

= Var(θ̂) + 2(ψ − θ)E(θ̂ − ψ) + (ψ − θ)2


= Var(θ̂) + (ψ − θ)2 .

Hence, the mean squared error can be written in terms of bias and variance as:

mse(θ̂) = Var(θ̂) + bias2 (θ̂).

From the simulation, the variance and mean squared error are Var(θ̂f ) = 1.783457 × 10−5 and
mse(θ̂f ) = 1.783561 × 10−5 , both of which are small. This demonstrates that θ̂f is consistent, as
mse(θ̂f ) → 0 as n → ∞.
We now compare the bias, variance, and mean squared error from the simulated data with the
exact theoretical values. For the exact mean:

n
1X
E(θ̂f ) = E(I(x > 4)) = E(I(x > 4)) = P r(X > 4) = θ = 0.01831564,
n i=1

showing that bias(θ̂f ) = 0 which means θ̂f is unbiased. Our simulation result gained above agreed
with this property.
For the variance:

1 1
Var(θ̂f ) = n · Var(I(x > 4)) = P r(X > 4)(1 − P r(X > 4)) = 1.798018 × 10−5 ,
n2 n
when n=1000, which closely matches the variance from the simulated data. Finally, for the mean
squared error:

1
mse(θ̂f ) = Var(θ̂f ) = P r(X > 4)(1 − P r(X > 4)) = 1.798018 × 10−5 ,
n

8
closely aligning with the simulated result.
Next, we focus on importance sampling. From the code, we obtain bias = −5.390686 × 10−5 , which
is extremely small. This indicates that the estimator θ̂g from importance sampling is unbiased.
Furthermore, we observe that the variance and mean squared error are Var(θ̂g ) = 5.272578 × 10−7
and mse(θ̂g ) = 5.301638 × 10−7 , both of which are small. This demonstrates that θ̂g is consistent,
as mse(θ̂f ) → 0 as n → ∞.
Hence, we can conclude that both estimators are very

bias_MC<-mean_MCsim_result-(1 - pweibull(4, shape = 2, scale = 2))


bias_MC
variance_MCsim_result<-var(MCsim_result)
variance_MCsim_result
mse_MC<-variance_MCsim_result+(bias_MC)^2
mse_MC
exact_variance<-(1/n)*Pr4$value*(1-Pr4$value)
exact_variance

bias_impsamp<-mean_imsamg_sim_result-(1 - pweibull(4, shape = 2, scale = 2))


bias_impsamp
variance_imsamg_sim_result<-var(imsamg_sim_result)
variance_imsamg_sim_result
mse_impsamp<-variance_imsamg_sim_result+(bias_impsamp)^2
mse_impsamp

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy