Advance Statistics
Advance Statistics
Advance Statistics
Lecture 1
Homework and Exam
Homework assignments
● 2 x 2 hours TA sessions per week (Tuesday & Thursday 11-13h, same room)
send via Email. They should “run out of the box” to give full points.
Exam
● There will be a written exam in the last session, on Thursday 29 th January
Contact
● Christoph Weniger (c.weniger@uva.nl)
● Many examples
● In the first week of the present course, we will briefly repeat some
of the most relevant material from SM as a reminder and to
provide context for the rest of the course.
Understanding statistical tools matters
This course is about why statistical methods work.
● Estimators
● Bayesian analyses
● Basics: Evidence, Model selection, Credible intervals
● ...
The two grand schools of statistical analysis
Frequentist Bayesian
Fisher Bayes
Possible Phenomena
Cause Phenomena causes
“Given a cause, what is the frequency (in “How does an observed phenomenon change
repeated experiments) of a certain my believe in different possible causes?”
phenomenon to occur?”
Probabilities in a nutshell
“Probabilities” mean here
● Frequencies of events (in 1/6 of the cases the dice shows a six)
● Probabilities for mutually exclusive and exhaustive elementary events/propositions sum to one:
● Structural consistency (the result does not depend on the way of reasoning) is guaranteed by the
rule for conditional probabilities
Model evidence or
global likelihood
It is a direct consequence of the rule for conditional probabilities:
Notes:
● Bayes' theorem provides a rule for how to update the probability or plausibility of
It is in general not equal to the posterior, which is most obvious looking at the
normalization of the functions (with x and θ being data and model parameters,
respectively).
Typical Frequentist questions
There is a new test for the Schnitzler syndrome (c) with the
characteristics:
● 5% false positive
● Maybe nobody had the Schnitzler syndrom in the last 100 years
→ Chances for you to having the disease could be still very low.
One could account for hidden trials by making abstract statements about the frequency of wrong
and right statistical statements (instead of observations). This is exactly what Bayesian inference
forces us to do from the start.
Typical Bayesian questions
There is a new test for the Schnitzler syndrome with the
characteristics:
● 5% false positive
Frequentist Bayesian
Pro: Pro:
● No prior dependence ● Prior dependence is formalized
(what is the prior for a flat Universe?) ● Reasoning about causes, not
Con: Con:
● Be aware of hidden trials ● Results are difficult to show in a
● Median xm
● Mode
● Skewness
Binomial distribution
Number of successes in draw of n elements with individual success probability p.
Poisson distribution
Notes:
● Its central importance comes from the central limit theorem
● Many random variables are normal distributed in practice, but the reasons for
Chi-squared distribution
Describes statistical distribution of outliers
Log-normal distribution
● ln(x) is normal distributed
from data
Many interrelations
Sum rules Dashed: approximation
Solid: relations
Important approximations
● Binomial to Poisson
● Poisson to Normal
● Chi-squared to Normal
From http://www.johndcook.com/blog/distribution_chart/
The central role of the normal distribution
The Central Limit Theorem (CLT) Example: chi-squared distribution
Notes:
● The CTL holds for a very large number of underlying
B) Cumulants. Define by the Taylor expansion of the log of the characteristic function
Proof of the Central Limit Theorem
The first three cumulants are functions of the mean, the variance and the skew
mean
For normal
variance distribution:
To see what this means, we rescale x such that the variance equals one
This implies
and in the
large N limit
Hence, for large enough values of N, only the first two cumulants are important.