Sim Slides 2 Handout

Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

Monte Carlo Simulation and Resampling

Tom Carsey (Instructor)


Jeff Harden (TA)
ICPSR Summer Course

Summer, 2011

Monte Carlo Simulation and Resampling

1/68

Resampling
Resampling methods share many similarities to Monte Carlo
simulations in fact, some refer to resampling methods as a
type of Monte Carlo simulation.
Resampling methods use a computer to generate a large
number of simulated samples.
Patterns in these samples are then summarized and analyzed.
However, in resampling methods, the simulated samples are
drawn from the existing sample of data you have in your
hands and NOT from a theoretically defined (researcher
defined) DGP.
Thus, in resampling methods, the researcher DOES NOT
know or control the DGP, but the goal of learning about the
DGP remains.
Monte Carlo Simulation and Resampling

2/68

Resampling Principles
Begin with the assumption that there is some population
DGP that remains unobserved.
That DGP produced the one sample of data you have in your
hands.
Now, draw a new sample of data that consists of a
different mix of the cases in your original sample. Repeat
that many times so you have a lot of new simulated
samples.
The fundamental assumption is that all information about
the DGP contained in the original sample of data is also
contained in the distribution of these simulated samples.
If so, then resampling from the one sample you have is
equivalent to generating completely new random samples
from the population DGP.
Monte Carlo Simulation and Resampling

3/68

Resampling Principles (2)


Another way to think about this is that if the sample of data
you have in your hands is a reasonable representation of the
population, then the distribution of parameter estimates
produced from running a model on a series of resampled data
sets will provide a good approximation of the distribution of
that statistics in the population.
Resampling methods can be parametric or non-parametric.
In either type, but especially in the non-parametric case, yet
another way to justify resampling methods according to
Mooney (1993) is that sometimes it is, better to draw
conclusions about the characteristics of a population strictly
from teh sample at hand, rather than by making perhaps
unrealistic assumptions about that population (p. 1).
Monte Carlo Simulation and Resampling

4/68

Common Resampling Techniques

Bootstrap
Jackknife
Permutation/randomization tests
Posterior sampling
Cross-validation

Monte Carlo Simulation and Resampling

5/68

The Bootstrap

Formally introduced by Efron (1979).


There are a variety of bootstrap methods, but at their core is
a common process:
Begin with an observed sample of size N
Generate a simulated sample of size N by drawing
observations from your observed sample independently and
with replacement.
Compute and save the statistic of interest
Repeat this process many times (e.g. 1,000)
Treat the distribution of your estimated statistics of interest
as an estimate of the population distribution of that statistic.

Monte Carlo Simulation and Resampling

6/68

Key Features of the Bootstrap


The draws must be independent each observation in your
observed sample must have an equal chance of being
selected.
If observations in the original sample are NOT independent,
then the resampling must accommodate that more later.

The simulated samples must be of size N to take full


advantage of the information in the sample.
Resampling must be done with replacement. If not, then
every simulated sample of size N would be identical to each
other and to the original sample.
Resampling with replacement means that in any give
simulated sample, some cases might appear more than once
while others will not appear at all.
Monte Carlo Simulation and Resampling

7/68

Sampling With/Without Replacement


set.seed(61893)
Names - c(Jeffrey,Sung-Geun,William,Andrew,
Michael,Kate,Rosie,Ahmed,Jeff,Tom)
N - length(Names)

sample(Names,N,replace=FALSE)
[1] Sung-Geun William
Tom
Michael
Jeffrey
[6] Andrew
Rosie
Jeff
Kate
Ahmed

sample(Names,N,replace=FALSE)
[1] Jeffrey
Andrew
Tom
William
Ahmed
[6] Jeff
Sung-Geun Michael
Rosie
Kate

sample(Names,N,replace=TRUE)
[1] Tom
Tom
Kate
Jeff
Sung-Geun
[6] Sung-Geun Michael
Andrew
Sung-Geun Ahmed

sample(Names,N,replace=TRUE)
[1] Sung-Geun Kate
William
Andrew
Kate
[6] Jeff
Jeff
Sung-Geun William
Tom
Monte Carlo Simulation and Resampling

8/68

What to Resample?
In the single variable case, you must resample from the data
itself.
However, in something like OLS, you have a choice.
Remember the X s fixed in repeated samples assumption?
So, you can resample from the data, thus getting a new mix
of X s and Y each time for your simulations.
Or you can leave the X s fixed, resample from the residuals of
the model, and use those to generate simulated samples of Y
to regress on the same fixed X s every time.

As before, it depends on the validity of the fixed in repeated


samples assumption, but also it is unlikely to matter in
practice.
Most folks resample from the data.
Monte Carlo Simulation and Resampling

9/68

Simple Example

Lets suppose I draw a sample of 10 folks and compute a


mean.
I could make a distributional assumption about that mean,
compute a standard error, and treat that as my best guess of
the population mean and its variance.
Or In can draw lots of resamples, compute a mean for each
one of them, and then plot that distribution

Monte Carlo Simulation and Resampling

10/68

10
5
0

Density

15

20

Bootstrap Sim of Mean, N=10

0.00

0.02

0.04

0.06

Estimated Mean of X

0.08

0.10

What Did We Learn?

The distribution of simulated sample means is close to


centered on our original sample estimate.
But the distribution is not normal, and not even symmetric.
A Bootstrap standard error would probably be better to use
than an analytic one that assumed a normal distribution.
So, how do we compute a Bootstrap standard error and
more importantly, a Bootstrap confidence interval?
There are multiple ways

Monte Carlo Simulation and Resampling

12/68

Standard Normal Bootstrap CI


This is the simplest method, mirrors the fully analytic
method of computing confidene intervals, and is parametric.
It is parametric because it assumes that the statistic of
interest is distributed normally.
You generate a large simulated sample of the parameter of
interest (e.g. a mean, a regression coefficient, etc.)
Since the SE of a parameter is defined as the standard
deviation of that parameter in multiple samples, you
compute a simulated SE as just the the standard deviation of
your simulated parameters.
A 95% CI is just your original sample parameter estimate
plus/minus 1.96 times your estimated SE.

Monte Carlo Simulation and Resampling

13/68

Standard Normal Bootstrap CI (2)

The advantage is its simplicity.


The disadvantage is that it makes a distributional assumption
that may not be appropriate. If normality is a good
assumption, the analytic calculation may be appropriate.
This also implicitly assumes that your estimate of the
parameter of interest is unbiased.

Monte Carlo Simulation and Resampling

14/68

Percentile Bootstrap CI

The Percentile version of the Bootstrap CI is noparametric.


This approach uses the large number of simulated
parameters of interest (e.g. means, medians, slope
coefficients) and orders them from smallest to largest.
A 95% CI is then computed by just identifying the lower CI as
the 2.5th percentile and the upper CI as the 97.5th percentile.
This leaves 95% of the simulated parameter estimates within
this range while dividing the remaining 5% of the simulated
values equally into the upper and lower tails.

Monte Carlo Simulation and Resampling

15/68

Percentile Bootstrap CI (2)


The advantage of this method is it does not make any
distributional assumption it does not even require the
distribution (or the CI) to be symmetric.
Of course, if a distributional assumption is appropriate and
you dont use it, this approach uses less information.
In addition, this method has been shown to be less accurate
than it could be.
Still, this is the most common way to compute a Bootstrap
CI.
Side Note: This approach parallels what Bayesians do when
the compute what they call Credible Intervals.

Monte Carlo Simulation and Resampling

16/68

Other Bootstrap CI Methods


The Basic Bootstrap CI: The simulated parameters are
adjusted by subtracting out the observed statistic, and then
the percentile method is applied.
The Bootstrap t CI: for each simulation you compute the
sample t statistic each time. Select the 2.5th and 97.5th
percentile t-scores. Use those to multiply by the simulated
SE (instead of 1.96).
The BCa Bootstrap CI: This method modifies the percentile
CI to correct for bias and for skewness. Simulation works
suggests better performance than the unadjusted percentile
CI.
Which to use? How to compute them?

Monte Carlo Simulation and Resampling

17/68

Which CI Method to Use?


All five are available in the boot package in R by first
computing the simulated parameters and then using the
boot.ci() function. See the Lab.
The first two are pretty simple to program yourself.
They all converge toward toward the true population
distribution of the parameter in question as:
the original sample size increases toward infinity
the number of resamples you draw increases toward infinity (If
the original sample is large enough)

Rules of Thumb: Replications of 1,000, sample size of 30-50


is no problem, but smaller can work if the sample is not too
odd.

Monte Carlo Simulation and Resampling

18/68

Does this Always Work?


To some, the bootstrap seems like magic.
However, it is still fundamentally dependent on the quality of
the original sample you have in your hands.
If the original sample is not representative of the population,
the simulated distribution of any statistics computed from
that sample will also probably not accurately reflect the
population. (Small samples, biased samples, or bad luck)
Also, the bootstrap simulated distribution of a sample
statistic is necessarily discrete, whereas often the underlying
population PDF is continuous. They converge as sample size
increases, but the simulated distribution remains discrete.
Nothing is perfect!

Monte Carlo Simulation and Resampling

19/68

Bootstrapping Complex Data


Resampling one observation at a time with replacement
assumes the data points in your observed sample are
independent. If they are not, the simple bootstrap will not
work.
Fortunately the bootstrap can be adjusted to accommodate
the dependency structure in the original sample.
If the data is clustered (spatially correlated, multi-level, etc.)
the solution is to resample clusters of observations one at a
time with replacement rather than individual observations.
If the data is time-serial dependent, this is harder because
any sub-set you select still breaks the dependency in the
sample, but methods are being developed.

Monte Carlo Simulation and Resampling

20/68

When Does it Not Work Well?

Data with serial correlation in the residual (as noted in the


last slide).
Models with heteroskedasticity (other than unit-specific a la
clustered data) when the form of the heteroskedasticity is
unknown. One approach here is to sample pairs (on Y and
X) rather than leaving X fixed in repeated samples.
Simultaneous equation models (because you have to
bootstrap all of the endogenous variables in the model).

Monte Carlo Simulation and Resampling

21/68

An Example
Londregan and Snyder (1994) compare the preferences of
legislative committees with the entire legislative chamber to
test if committees are preference outliers.
Competing theories:
Committees will be preference outliers due to self-selection
and candidate-centered incentive to win re-election.
Committees will NOT be preference outliers because the floor
assigns members to develop expertise for the floor to follow.

Empirical work is mixed What are the problems?


Ideology scores are measured with error, but that error is
ignored.
Too many use analytic tests on two-sample differences of
means when they should use non-parametric tests (e.g.
bootstraps) on differences of medians.
Monte Carlo Simulation and Resampling

22/68

Londregan and Snyder (cont.)


Two-sample tests fail when there are more than two groups
and some people are part of more than one group.
Two-sample tests treat all heterogeneity on a committee as
sampling error.
Theory is about the median voter, but sample means tests
do not fit that theory.
Between measurement error issues and concerns about the
statistical properties of medians, the resampled among
legislators to estimate committee and floor medians, and
then how far apart theyd have to be to be considered
significantly different.
Results

Monte Carlo Simulation and Resampling

23/68

Fig1.pdf

One More Example

Benoit, Laver, and Mikhaylov (2009) analyze texts of the


Comparative Manifesto Project (CMP), which uses party
manifestos to measure ideological locations.
Current methods fail to consider measurement error in these
types of measures.
Use bootstraps to estimate this variability so it can be
accounted for in subsequent regression models.
The CMP data is extremely influential.

Monte Carlo Simulation and Resampling

25/68

Benoit et al (cont.)

Use a bootstrap to estimate uncertainty, then use methods


that accommodate uncertainty.
They find that many reported differences between parties are
probably not real differences, but rather due to random noise
in the measures that others Failed to consider.
Example of French Parties: You find out that the
Communist, Socialist, Green, and Union for a Popular
Movement parties are probably not statistically significantly
different from each other. Only the far-right National Front
is clearly different.

Monte Carlo Simulation and Resampling

26/68

FIGURE 3 Left-Right Placement of the Major French Parties in


2002. Bars Indicate 95% Confidence Intervals

The Jackknife
The Jackknife emerged before the bootstrap.
Its primary use has been to compute standard error and
confidence intervals just like the bootstrap.
It is a resampling method, but it is based on drawing n
resamples each of size n-1 because each time you drop out a
different observation.
The notion is that each sub-sample provides an estimate of
the parameter of interest on a sample that can easily be
viewed as a random sample from the population (if the
original sample was) since it only drops won case at a time.
NOTE: You can leave out groups rather than individual
observations if the sampling/data structure is complex (e.g.
clustered data).
Monte Carlo Simulation and Resampling

28/68

Jackknife (2)

The jackknife is less general than the bootstrap, and thus,


used less frequently.
It does not perform well if the statistic under consideration
does not change smoothly across simulated samples.
It does not perform well in small samples because you dont
end up generating many resamples.
However, it is good at detecting outliers/influential cases.
Those sub-sample estimates that differ most from the rest
indicate those cases that has the most influence on those
estimates in the original full sample analysis.

Monte Carlo Simulation and Resampling

29/68

A Digression to Cooks D
The jackknife works very similarly to Cooks Distance (or
Cooks D), which is a measure of how influential individual
observations are on statistical estimates (in OLS).
P
j Y
ji )2
(Y
Di =
2
k

(1)

Where:
k = the number of parameters in the model
j for j th observation for full model
Y
ji for j th observation after i th observation has been dropped.
Y

Large values of D indicate influential points.


Large = values greater than 4/(n k 1) [n=sample size
and k=number of parameters estimated in the model].
Lers see an example.
Monte Carlo Simulation and Resampling

30/68

Plot of Poverty and Per Capita Income Using State Postal Codes

20

MS

DC
LA

18

NM

WV
AR
KY

OK

TX

16

TN
SC
GA
NC

14

AZ

NY

MT SDMO MI
OR
OH
ME
IN
ID

12

Percent in Poverty

AL

CA
KS

FL
PA

IL
WA

ND
IA

CO

NE
WI

RI DE
AK

UT

10

VT

NV
MN
VA
HI

MA
WY
NJ

CT
NH

30000

35000

40000

MD

45000

50000

55000

Per Capita Income

Figure: State Level Poverty Rate as a Function of Per Capita Income

Looking for Outliers

2
0

Cook's Distance

DC

MS
MD
NYNC
WVWIWY
OK
ALAKAZARCA
ND
MAMIMN MO
MTNENVNHNJNM
COCTDE FLGAHIIDILINIAKSKYLAME
OH
ORPARISCSDTNTXUTVTVAWA

10

14

18

22

26

30

34

38

42

46

50

Figure: Cooks Distance Plot from Model of State Level Poverty Rate
as a Function of Per Capita Income

Jackknife-after-Bootstrap

Rizzo (2008, pp. 195-6) suggests combining the bootstrap


and jackknife proceedures.
First, you run the bootstrap to generate your bootstrap
estimates of the parameter of interest.
Then you run a jackkife by dropping all bootstrap samples
that include the i th observation, then summarizing across
these jackknifed samples.
The procedure is available in the boot package in R .

Monte Carlo Simulation and Resampling

33/68

Permutation/Randomization
Just another form of resampling, but in this case it is done
without replacement.
They have been around since Fisher introduced them in the
1930s.
Often used to conduct hypothesis testing where the Null is
zero.
Rather than assume a distribution for the Null hypothesis, we
simulate what it would be by randomly reconfiguring our
sample lots of times (e.g. 1,000) in a way that breaks the
relationship in our sample data.
The question then is how often do these permutations or
randomly reshuffled data sets produce a relationship as large
or larger than the one we saw in our original sample?
Monte Carlo Simulation and Resampling

34/68

Running a Permutation Experiment

Suppose we are testing the difference in means between Men


and Women on some variable.
If we have NM Men and NW Women, such that
N = NM + NW , then the total number of possible
permutations where the first group equals the number of
(N +N )!
men and the total sample size stays the same is NMM !NWW!
Suppose we have a sample of 14 Men and 12 Women. If so
(14+12)!
then we have 14!12! = 9, 657, 700 possible permutations.
Thus, in most settings, we randomly generate some of the
possible permutations and call it good.

Monte Carlo Simulation and Resampling

35/68

Additional Considerations

Randomization tests do assume exchangeability. If the Null


of no effect is true, the observed outcomes across individuals
should be similar no matter what the level of the treatment
(X ) variable is.
This is a weaker assumption than iid.
If we do examine all possible permutations, that is often
called a permutation test, or an exact test.
If we just simulate a large number, its called a
randomization test.
What do we reshuffle? Most reshuffle the treatment.

Monte Carlo Simulation and Resampling

36/68

A Simple Example

I have data on the weight of chicks and what they were fed.
The samples are small, and the distributions unknown.
Still, I want to know if their weights differ based on what
they were fed.
In a parametric world, Id do a two-sample difference of
means t-test
But, that is only appropriate if the distributional assumption
holds.

Monte Carlo Simulation and Resampling

37/68

Randomization Example

attach(chickwts)
x - sort(as.vector(weight[feed==soybean]))
y - sort(as.vector(weight[feed==linseed]))
x
[1] 158 171 193 199 230 243 248 248 250 267 271 316 327 329
y
[1] 141 148 169 181 203 213 229 244 257 260 271 309

Monte Carlo Simulation and Resampling

38/68

Are the Two Groups Different?

Sample.T - t.test(x,y)
Sample.T
data: x and y
t = 1.3246, df = 23.63, p-value = 0.198
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-15.48547 70.84262
sample estimates:
mean of x mean of y
246.4286 218.7500

Monte Carlo Simulation and Resampling

39/68

Setting Up the Test

set.seed(6198)
R - 999
z - c(x,y)
K - seq(1:length(z))
reps - numeric(R)
t0 - t.test(x,y)$statistic
for(i in 2:R)
k - sample(K,size=14,replace=FALSE)
x1 - z[k]
y1 - z[-k]
reps[i] - t.test(x1,y1)$statistic

Monte Carlo Simulation and Resampling

40/68

Results

What proportion of t-test scores were at or above the one we


observed in our sample?
p - mean(c(t0,reps) - t0)
p
[1] 0.903

Note that I included the actual sample estimate in the


calculation.
This p-value is large than .05, so wed fail to reject the Null
of no difference if we were using a 95% cut-off.
What does the distribution of the t-tests look like?

Monte Carlo Simulation and Resampling

41/68

0.2
0.1
0.0

Density

0.3

0.4

Density function of simulated t-tests

-4

-2

0
Simulated values of t

What Did We Learn?

It does not look like the means of these two groups differ
significantly (at least at the .05 level of significance).
We can compare any aspect of these two samples the same
way compute the statistic every time for a thousand
replications and then look at their distribution.
In fact, there are tests to evaluate whether the two
distributions are statistically significantly different or not.

Monte Carlo Simulation and Resampling

43/68

Density functions of Chick Weights

0.004
0.002
0.000

Density

0.006

Soybean
Linseed

100

150

200

250

Weight in Grams

300

350

400

Example: Legislative Networks


Legislators form networks of cooperation, in this case, via
co-sponsorship of bills.
Those connections are intentional actions that signal
relationships.
Does party structure those relationships? We can measure
network modularity due to partisanship.
Modularity measures how well a division separates a network
into distinct groups by measuring the number of ties within a
group versus the number of ties between groups.
But what is the distribution of modularity? Lets estimate it
rather than assume it.

Monte Carlo Simulation and Resampling

45/68

Kirkland (2011)
Modularity is bounded between -1 and 1, but no known
distribution.
Kirkland simulates that distribution by randomly partitioning
the network 25,000 times (basically randomly assigning
legislators to two teams)
The population PDF is then estimated by the 25,000
modularity statistics computed on randomly partitioned
networks.
Use a percentile method to compute 95% confidence
interval, and compare the observed modularity in a chamber
to this null distribution.
Party matters

Monte Carlo Simulation and Resampling

46/68

0.4

Distribution of Party Modularity across Lower State Legislative Chambers

Party Modularity
Null Modularity Region
Avg. U.S. House Modularity

0.3

0.2

0.047

0.1

0.031

0.0

0.1

Party Modularity

AK
NC

IN

MS
HI
OK SD AR MD LA WV NV VT
RI
NY SC TX GA WI NM WA MA KS MO VA
DE OH CO WY ND NJ
KY UT
FL MN CT
AL
TN OR PA ME NH
MI
MT AZ CA
IA

Comparing Methods
Bootstrap is the most flexible and most powerful. It can be
extended to any statistical or calculation you might make
using sample data.
Bootstraping does NOT make the exchangeability
assumption that randomization tests make.
Jackknife is limited by sample size
Permutation/Randomization methods break all relationships
in data dont let you produce a covariance matrix.[but what
if we reshuffled just on Y ?]

I think Bootstrap confidence intervals, etc. will be standard


in empirical social science in 5-10 years.

Monte Carlo Simulation and Resampling

48/68

Posterior Simulation (PS)


Definition: a simulation-based approach to understanding
patterns in our data
Of course, we want to go beyond our data to draw inference
about the population from which it came.
A straightforward way to go beyond simple tables of
regression coefficients
Calculate quantities of interest (QI)
Account for uncertainty

Uses Bayesian principles, but does not require Bayesian


models
Example: CLARIFY in Stata (King, Tomz, and Wittenberg
2000)

Monte Carlo Simulation and Resampling

49/68

Posterior Simulation (PS)


Key assumption: coefficients/SEs we estimate are drawn
from a probability distribution that describes the larger
population
Coefficients define the mean, SEs define the variance

With large enough sample size, according to the central limit


theorem this distribution is multivariate normal
Instead of a bell curve, imagine a jello mold that can take on
different colors, flavors, and textures

The goal of PS: make random draws from this distribution to


simulate many hypothetical values of the coefficients
Instead of drawing one single number, as with rnorm(), we
draw a vector of numbers (one for each coefficient)

Monte Carlo Simulation and Resampling

50/68

Posterior Simulation (PS)

The next step: choose a QI


Expected value, predicted probability, odds ratio, first
difference, change in hazard rate, etc. . .

Set a key variable in the model to a theoretically interesting


value and the rest to their means or modes
Calculate that QI with each set of simulated coefficients
Set the variable to a new value
Calculate that QI with each set of simulated coefficients
Repeat as appropriate

Monte Carlo Simulation and Resampling

51/68

Posterior Simulation (PS)

At every value of the variable, we now have many


calculations of the QI
Final step: efficiently summarize the distribution of the
computed QI at each value of our variable
Most common: point estimate and confidence intervals
Can represent this in a table or graph (we will do a graph
example)

Monte Carlo Simulation and Resampling

52/68

Advantages of PS

Provides more information than a just a table of regression


output
Accounts for uncertainty in the QI
Flexible to many different types of models, QIs, and variable
specifications
After doing it once, easy to use
Can be much easier than working with analytic solutions

Monte Carlo Simulation and Resampling

53/68

Limitations of PS

Relies on CLT to justify asymptotic normality


Fully Bayesian model using MCMC could produce exact
finite-sample distribution
Bootstrapping would require no distributional assumption

Computational intensity
Large models can produce lots of uncertainty around quantity
of interest

Monte Carlo Simulation and Resampling

54/68

Motivation for Cross-Validation (CV)


A key component of scientific research is the independent
assessment and testing of our theories.
Lave and March (1979) model of theory building:
Observe something in the world
Speculate about why it appears the way it does (develop a
theory)
If your theory were true, what else would you expect to
observe?
Testing a your theory involves exploring that what else.

The what else might involve other dependent variables,


but it might also involve the same dependent variable in an
independently drawn sample of data
The key is: Data that helps you build your theory cannot
also be used as an independent test of theory
Monte Carlo Simulation and Resampling

55/68

Motivation for CV (2)

Using the same data to build and test a theory leads to


over-fitting a statistical model to your sample.
Such over fitting to the sample captures aspects of the true
population DGP you care about . . .
But, it also captures elements that are peculiar to your
particular sample that are NOT reflective of the true DGP.
Thus, over-fitting a model to your sample actually leads to
worse/less accurate inference about the population from
which it came.

Monte Carlo Simulation and Resampling

56/68

Motivation for CV (3)

How can you avoid using the same data to build and then
test your theory?
Develop your theory, specify your model, etc. before looking
at your data, then run the statistical analysis you planned
one time and write it up.
Use the data you have to build your model, then collect fresh
data to test it.
Divide the data you have so you use some of it to build your
model and some of it to independently test it.
This last option is cross-validation

Monte Carlo Simulation and Resampling

57/68

Cross-Validation (CV)

Definition: a method for assessing a statistical model on a


data set that is independent of the data set used to fit the
model
Often used in disciplines like computer science that focus on
predictive models
Goal: guard against Type III errorsthe testing of
hypotheses suggested by the data (i.e., overfitting)
Many different types based on different ways of constructing
the independent data
Many different fit statistics can be calculated in a CV routine

Monte Carlo Simulation and Resampling

58/68

CV Example: The Netflix Prize


Key component of Netflix: recommend the right movies to
the right people
Main data source: customers own ratings of movies they
have seen
2009: $1 million prize for beating Netflixs current prediction
system
Netflix provided 100 million ratings from 480,000 users of
18,000 movies
Teams developed models predicting ratings in these data
Submissions were then evaluated on 2.8 million ratings not
included in the data given to teams

Monte Carlo Simulation and Resampling

59/68

CV Example: The Netflix Prize


The winning team beat Netflixs own system by 10% as
judged by mean squared error
Why was it important to set aside the 2.8 million ratings for
model evaluation?
If the evaluation was done on the 100 million, teams could
have gamed the system
Find the odd quirks of that particular sample that came about
due to random sampling
Overfit the model to account for these odd quirks
This would make the model look really good on this one
sample, but it wouldnt be generalizable to other samples

For more information: http://www.netflixprize.com/


assets/GrandPrize2009BPCBellKor.pdf

Monte Carlo Simulation and Resampling

60/68

CV Example: Predicting Divorce

Psychologist featured in Malcolm Gladwells book Blink


Research on whether a couple will get divorced based on
watching them argue for 15 minutes
Videotaped 57 couples, coded several variables
Claims 80-90% accuracy (amazing!)
But there are problems. . .

Monte Carlo Simulation and Resampling

61/68

CV Example: Predicting Divorce

Predictive formula developed with knowledge of the


couples marriage outcomes
Then the formula was applied to those same couples
Are we still amazed that he got 80% right?
A better test: get a new sample of couples and apply the
formula to them
For more information:
http://www.slate.com/id/2246732/

Monte Carlo Simulation and Resampling

62/68

Stagflation

The 1970s saw a period thought impossible in a modern


economy high unemployment AND high inflation.
Most statistical models of the economy at the time fail to
predict this. Why?
Forecasters used massively large models that included
hundreds of variables. This resulted in forecasting failure do
to:
Massive uncertainty in a model that is estimating
hundreds/thousands of parameters.
Over-fitting the sample data.

Monte Carlo Simulation and Resampling

63/68

CV Examples

These examples show us the importance of out-of-sample


prediction
There are always oddities in a particular sample
We dont want to fit our models to those oddities
CV only rewards models for picking up general patterns that
appear across samples
The problem: where do we get a new sample?

Monte Carlo Simulation and Resampling

64/68

General CV Steps

Randomly partition the available data into a training set and


a testing set

Fit the model on the training set

Take the parameter estimates from that model, use them to


calculate a measure of fit on the testing set

Can repeat steps 13 several times, average to reduce


variability

Monte Carlo Simulation and Resampling

65/68

Two Types of CV

Split-sample CV:
Partition into 50% training, 50% testing (could also do
75/25, 80/20, etc. . . )
Usually want to maximize size of training set
Particularly common in time series analysis where the testing
data is generally the most recent years for which data is
available.

Leave-one-out CV:
Iterative method with number of iterations = sample size
Each observation becomes the training set one time
Note the parallel to the Jackknife and Cooks D.

Monte Carlo Simulation and Resampling

66/68

Leave-One-Out CV

Delete observation #1 from the data

Fit the model on observations #2n

Apply the coefficients from step #2 to observation #1,


calculate the chosen fit measure

Delete observation #2 from the data

Fit the model on observations #1 and #3n

Apply the coefficients from step #5 to observation #2,


calculate the chosen fit measure

Repeat until all observations have been deleted once

Monte Carlo Simulation and Resampling

67/68

Limitations of CV

Training and testing data must be random samples from the


same population (Why?)
Will show biggest differences from in-sample measures when
n is small (Why?)
Higher computational demand than calculating in-sample
measures
Subject to researchers selection of an appropriate fit statistic

Monte Carlo Simulation and Resampling

68/68

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy