Ferson

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Monte Carlo methods

Scott Ferson
Applied Biomathematics
scott@ramas.com
Background
• Most risk assessments are deterministic
and deliberately conservative

• However ...
– degree of conservatism is opaque,
unquantified, and can be inconsistent

– difficult to characterize risk, except in


extreme situations
What’s needed
An assessment should tell us

• How likely the various consequences are

• How reliable the conclusions are


Why do an uncertainty analysis
• The only way to get at likelihoods
• Produces better understanding of risk
• Promotes transparency
• Enhances credibility
• Improves decision making
• EPA guidance now available
Randomness
• “Random” means unpredictable

• Distribution of values

• Distribution may be constant (stationary)

• Distributions used to model both variability


and incertitude at the same time
Distributions
p(x)
• Density distribution or
mass distributions
X

• Cumulative distribution Pr(x<X)

function (CDF, integral


of density)
X

• Complementary CDF Pr(x>X)

(exceedance risk)
X
Continuous distributions
• Normal 1

Cumulative probability
• Uniform 0.8
• Exponential 0.6
• Lognormal 0.4

• Triangular 0.2

• Weibull 0
0 2 4 6 8
• Beta
• Laplace
• Many, many more
Discrete distributions
1

Cumulative probability
• Binominal 0.8
0.6
• Poisson
0.4
• Discrete uniform
0.2
• Reciprocal 0
0 2 4 6 8

Distributions can also be “mixed”


(neither continuous nor discrete)
Ensemble
• Statisticians call it the “population” or
“reference class”
• E.g., dysfunction rates in prostate patients
• 50% of attempts at sex fail, or
• 50% of men are totally impotent
• If the analyst can't tell you what ensemble
the distribution represents, the analysis is
probably meaningless
Probabilistic models
• The same as deterministic models
except that point values are replaced
by probability distributions

• Well, almost
• Ensembles
• Distribution shapes and tails
• Dependencies
• Backcalculation
Computational methods
• Analytical approaches
– Laplace and Mellin transforms
– Delta method (just the mean and variance)

• Discrete probability distributions

• Monte Carlo simulation


Monte Carlo simulation

convolution

+ =
X Y X+Y
Generating random deviates
• Can make realizations from an arbitrary
distribution using uniform deviates

• There are many uniform generators


(computerized eenie-meanie-miney-moe)

• Many distribution shapes have special


algorithms that are faster or better
Sampling arbitrary distributions
1

Cumulative probability
uniform deviate

0.8

0.6

0.4

0.2

0
0 50 100 150
Value of random variable
How to specify distributions
• Default distributions
comes right out of the book
• Fitted or empirical distributions
usually not enough data available
• Extrapolations and surrogate data
requires professional judgment
• Elicitation from experts
expensive, controversial when experts disagree
• Maximum entropy criterion
inconsistent through changes of scale

All distributions should be reality-checked


Fitting distributions
• Assumes you know the distribution family, or
array of possible families
• Different methods give different results
– Method of moments (mean and variance are all that matters)
– Maximum likelihood
– Regression methods

• Goodness of fit (less important than sensibleness)


– Chi squared
– Anderson-Darling (focuses on tails)
– Kolmogorov-Smirnov (largest vertical difference)
Empirical distributions
Data
0.653
0.178 Empirical cumulative probability 1
0.263
0.424
0.284 0.8
0.438
0.471 0.6
0.852
0.480
0.375 0.4
0.148
0.185 0.2
0.320
0.642
0.247 0
0.784 0 0.2 0.4 0.6 0.8 1 1.2
0.643 Value of random variable
0.261
0.636
0.487
Empirical distributions
• Need a fair amount of data to yield
reliable estimates of distributions

• Tails are especially likely to be poorly


modeled

• Depends on “plotting position”


Maximum entropy n
− k ∑ pi ln pi
• Information theory i =1
• Equiprobability of all possibilities
• Extends Laplace's “principle of insufficient reason”
• Specified by what is known
• Unbiased answer
• Mathematically more defensible than arbitrary
assignment of distributions
• Much cheaper than expert elicitation
Maximum entropy solutions
When you know Use this shape
{minimum, maximum} uniform
{mean, standard deviation} normal
{minimum, maximum, mode} beta
{minimum, mean} exponential
{min, max, mean, stddev} beta
{minimum = 0, some quantile} exponential
{minimum > 0, some quantile} gamma
{minimum, maximum, mean} beta
{mean, geometric mean} gamma
Maximum entropy’s problem
• Depends on the choice of scale

• For instance, knowing the possible


range for degradation rate yields one
distribution, but knowing the possible
range for half life yields a very different
one even though the information is
exactly the same
Dependencies
• Independence (knowing X tells nothing about Y)
• Perfect dependence F(X) = G(Y)
• Opposite dependence F(X) = 1−G(Y)
• Complete dependence Y = z(X)
• Linearly correlated Y = mX + b + ε
• Ranks linearly correlated G(Y) = mF(X) + b + ε
• Functional modeling Y = z(X) + ε
• Complex dependence (anything else!)
Uncorrelatedness is not independence
Modeling dependencies
• Need to check that correlations are not
infeasible (e.g., corr(A,B) = 0.9, corr(A,C)=0.9,
corr(B,C)=−0.9)
• Monte Carlo simulations can model all the kinds
of dependencies except complex dependence
• Many dependence patterns yield the same
correlation coefficient (most MC software
arbitrarily selects one of these patterns)
• It can be difficult to specify the dependence if
empirical information is sparse
A ~ normal(2,1), B ~ lognormal(5,2)
A+B (1000 Trials with independence)
A+B (1000 Trials with 0.7 correlation)
Cumulative probability

1.000

.750

.500

.250

.000
-5.00 0.00 5.00 10.00 15.00

Differences in tail risks could be much greater if the dependence


between A and B is nonlinear, even if their correlation is very small.
How many replications
• More is better (you're never really done)
• Tails are especially hard to nail down
• Curse of dimensionality
• Latin hypercube sampling can help
• Repeat the simulation as a check ü best
• Confidence intervals on fractiles
• Kolmogorov-Smirnov limits on distributions
100 versus 10,000 replicates
0.02
1.000

.750
100 Trials:
close-up of 0.01
.500

left hand tail .250

0.00
.000

0.02
1.000

10,000 Trials: .750

close-up of left 0.01


.500
hand tail
.250

0.00
.000
0.00 125.00 250.00 375.00 500.00

Time to contamination (yr)


Confidence interval for a fractile
The α100% confidence interval for the pth
fractile can be estimated by [Yi, Yj ], where
Yk is the (n − k + 1)th largest value from the
Monte Carlo simulation, i = floor(np − b), j =
ceiling(np + b), b = z−1((1 − α)/2)√(np(1 − p))

• Vary n to get the precision you desire


• But remember this represents only sampling
error, not measurement error
Kolmogorov-Smirnov bounds
Bounds on the distribution as a whole
200
100 replications
replications 2000 replications
1000 replications
1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9

95% of the time, the entire distribution


will lie within the bounds
Latin hypercube sampling
• Stratify the sampling of distributions
– Divide each distribution into n equal-probability
regions, where n = #reps
– For each replicate, select one deviate from
each distribution
– Select each region only once
• Often much better for moments, sometimes
better for tails
Each variable gets a single value
• A variable can't be independent of itself, (e.g.,
multiple-route exposure model)

cair iair cwater iwater csoil isoil


+ +
BW BW BW
• All BW’s should be the same value within a
single Monte Carlo replicate
• Be wary of “libraries” of partial results
Model uncertainty
• Introduce a new discrete variable

• Let the value of the variable dictate which


model will be used in each rep

• Wiggle the value from rep to rep

• Only works for short, explicit list of models


(you have to list the models)

• Many theorists object to this strategy

• It is equivalent to stochastic mixture


Model uncertainty as a mixture
If u>0.5 then model=I else model=II

II
I

1
I or II

0
Backcalculation
• Cleanup and remediation planning requires
backcalculation

• Backcalculation with probability distributions


is called deconvolution

• How can we untangle the expression


A+B = C
when we know A and C, and need B?
Can’t just invert the equation

conc × intake
dose =
body mass

dose × body mass


conc =
intake
0.06 0.4

0.04
0.2
0.02

0.00 0.0
0 2 4 6 8 0 2 4 6 8
Planned dose Body mass

0.06 0.03

0.04 0.02

0.02 0.01

0.00 0.00
0 2 4 6 8 0 2 4 6 8
Intake Concentration
0.06
Large doses in the realized distribution (arising
0.05 from the allowed distribution of concentrations)
are more common than were originally planned.
0.04
planned
0.03

0.02

0.01 realized

0
0 2 4 6 8
Dose
Normal approximation
• If A+B=C, compute B as C−A under the
assumption that the correlation
between A and C is r = sA/sC

• Needs Pearson (not rank) correlation

• Usually okay if multivariate normal


Normal approximation with non-normal distributions
1 .0 0
.7 5
.5 0
.2 5
.0
1 .0 0 3 .0 0 5 .0 0 7 .0 0 9 .0 0 0 .0 0 4 .3 8 8 .7 5 13.13 17.50

A=lognormal(5,1.5) C=triangular(0,12,15.5)

1 .0 0
.7 5 realized
.5 0 planned
.2 5
.0
-5.0 0 -0.63 3 .7 5 8 .1 3 12.50 0 5 10 15 20
C−A, with r =sA/sC C
Iterative (trial & error) approach
• Initialize B with C−A
• This distribution is too wide
• Transform density p(x) to p(x)m
• Rescale so that area remains one
• Whenever m > 1 dispersion decreases
• Repeat until you get an acceptable fit
0.06
By trial and error, you may be able to find a distribution
0.05 of concentrations that yields something close to the
planned distribution of doses. This is not always
0.04 possible however.
conc× intake
= dose
body mass
0.03
 intake 
ln   + ln(conc) = ln(dose)
0.02  body mass 
1442443 1 424 3 1 424 3
A B C

0.01

0
0 2 4 6 8
Dose
Monte Carlo methods
• How?
– replace each point estimate with a distribution
– repeatedly sample from each distribution
– tally answers in histogram
• Why?
– simple to implement
– fairly simple to explain
– summarizes entire distribution of risk
• Why not?
– requires a lot of empirical information (or guesses)
– routine assumptions may be “non-protective”
Steps
• Clarify the questions and gather data
• Formulate the model (identify variables
and their interrelationships)
• Specify distributions for each input
• Specify dependencies and correlations
• Run simulation
• Conduct sensitivity studies
• Present results
• Discuss limitations of the assessment
Limitations of Monte Carlo
• Needs precise, stationary distributions
• Nonlinear dependencies often ignored
• Unknown dependencies often ignored
• Assumes model is well understood
• Can be computationally expensive
• Backcalculations difficult or impossible
• Treats incertitude like variability
• Thought by some to be too difficult
Pitfalls of Monte Carlo
• Muddled model
• Infeasible correlation matrices
• Inconsistent independence assumptions
• Multiply instantiated variables
• Too few replications
• No sensitivity studies
• Confounding of different populations
Kinds of uncertainty
• Variability
– Stochasticity
– Heterogeneity
– Spatial variation
– Individuality and genetics
• Incertitude
– Measurement error
– Ignorance
– Model uncertainty
• Vagueness, confusion, etc.
Why distinguish them?
• Empirical effort can reduce incertitude

• Usually can't reduce variability

• Managers need to distinguish the two


kinds of uncertainty to plan remediation
efforts effectively
Two-dimensional simulation
• Monte Carlo nested inside Monte Carlo
• Inner loop for variability
• Outer loop for uncertainty (incertitude)
• Squared number of replications
• Integrated sensitivity analysis
Probability boxes
• Bounds about the cumulative distribution
• Rigorous (rather than approximate)
• Analytical (not from simulations)
• Can account for lack of knowledge about
distribution shapes and dependencies
1 1 1

0 0 0
0 20 0 40 0 80

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy