Advance Statistics

Advanced Statistical Methods
Lecture 1
Homework and Exam
Homework assignments
● 2 x 2 hours TA sessions per week (Tuesday & Thursday 11-13h, same room)
● Homework is handed out at beginning of TA session and should be handed in
one week later at end of TA session

● Help on the homework is provided during TA sessions
● Exercises require analytic work as well as numerical work on the computer
● Homework can be hand in hand-written, or send via Email (PDF)
● For numerical work, programs should be written as Ipython Notebooks and
send via Email. They should “run out of the box” to give full points.
Exam
● There will be a written exam in the last session, on Thursday 29 th January
● The total grade depends on both homework assignments (60%) as well as
the exam (40%)
Contact
● Christoph Weniger (c.weniger@uva.nl)
● Michael Feyereisen (m.r.feyereisen@uva.nl)
● Richard Bartels (richard.t.bartels@gmail.com)
Slides & homework: https://staff.fnwi.uva.nl/c.weniger/

Later: Blackboard
Recommended Literature
Glen Cowan, Statistical Data Analysis,
Oxford Science Publications, 1998
● Frequentist analysis, well known in Particle Physics
● Bonus: Monte Carlo methods and Unfolding
R. J. Barlow, Statistics, A guide to the Use of

Statistical Methods in the Physical Sciences,
The Manchester Physics Series, 1988
● Traditional and very good book on
Frequentist data analysis
P. Gregory, Bayesian Logical Data Analysis for

the Physical Sciences, Cambridge University
Press, 2005
● Bayesian “Bible”, Conceptual introduction
● Many examples
J.V. Wall and C.R. Jenkins, Practical Statistics

for Astronomers, Cambridge Observing
Handbooks for Research Astronomers, 2003
● Practical book for data analysis in Astronomy
(both Frequentist and Bayesian)

● Many examples
Connection to Phil's Course
● Since most of you attended the course by Phil Uttley on Statistical

Methods in Astrophysics and Astronomy (SM), I will assume that
you know many of the basics, and continue from there.
● SM was based on Simon Vaughan's book Scientific Inference,

Learning from Data.
● In the first week of the present course, we will briefly repeat some
of the most relevant material from SM as a reminder and to
provide context for the rest of the course.
Understanding statistical tools matters
This course is about why statistical methods work.
● When describing weak signals, close

to the experimental threshold, the
details of the statistical method are
crucial
● In cases where the experimental result is ● Assumptions underlying the standard
clear, the details of the statistical method recipes might be violated
often do not matter. ● It is important to understand not
● In many cases, it is enough to apply standard only who, but why statistical
statistical recipes (normal distribution, error inference works. This is what the
propagation), to get reasonable results. present course aims to do.
Overview
● Introduction: Bayesian and Frequentist statistics

● Probability distribution functions & Central limit theorem
● Frequentist analyses
● Hypothesis testing
● Estimators
● Confidence intervals & Wilk's theorem

First
● Profile likelihood technique & pitfalls
four
● Trial factors & Coverage
lectures
● Numerical minimizers
● Bayesian analyses
● Basics: Evidence, Model selection, Credible intervals
● Priors: Flat prior, Jeffery's prior, Non-informative priors
● Sampling techniques: Markov Chain Monte Carlo, Multinest
● Applications & Advanced material

● Principal component analysis
● Angular power spectrum
● Bootstrapping and Jackknife
● ...
The two grand schools of statistical analysis
Frequentist Bayesian
Fisher Bayes
Possible Phenomena
Cause Phenomena causes
Deductive logic Inductive logic

● Based on “frequencies” of phenomena ● Probabilistic extension of logic
● Central quantity: “p-value” ● “Posterior distribution”
“Given a cause, what is the frequency (in “How does an observed phenomenon change
repeated experiments) of a certain my believe in different possible causes?”
phenomenon to occur?”
Probabilities in a nutshell
“Probabilities” mean here
● Frequencies of events (in 1/6 of the cases the dice shows a six)
● Plausibility or believe in a proposition (the believe in “The Higgs boson exists.”)
The most relevant rules

● Degrees of plausibility are represented by real numbers between 0 (not realized) and 1 (realized)
● Probabilities for mutually exclusive and exhaustive elementary events/propositions sum to one:
(I indicates background information)
● An event/proposition is either true or false (inference in binary logic)
● Structural consistency (the result does not depend on the way of reasoning) is guaranteed by the
rule for conditional probabilities
● Elementary events/propositions follow the rules of set theory
(for a full discussion and derivation from

fundamental requirements for consistent
reasoning see Chapter 2.5 in Gregory)
Bayes' Theorem
Likelihood function Prior
Posterior
Model evidence or
global likelihood
It is a direct consequence of the rule for conditional probabilities:
Notes:
● Bayes' theorem provides a rule for how to update the probability or plausibility of
a certain hypothesis H to be true in light of data D. This always depends on

additional background information I, which is often not made explicit.
● Frequentists are interested in likelihood functions only
It is in general not equal to the posterior, which is most obvious looking at the
normalization of the functions (with x and θ being data and model parameters,
respectively).
Typical Frequentist questions
There is a new test for the Schnitzler syndrome (c) with the
characteristics:
● 5% false positive
● 10% false negative
You order the test online, and get a positive result.

Should you be worried?
Fisher
The Frequentist says:
“I can exclude the null-hypothesis of not having the Schnitzler syndrome at 95% CL.
End of story.”
Caveat: There are hidden trials

● There might be 20 other people having done the test, none of them having the disease.
Still, one of them will get a positive result on average.

● Maybe you also did tests for other diseases.
● Maybe nobody had the Schnitzler syndrom in the last 100 years
→ Chances for you to having the disease could be still very low.
One could account for hidden trials by making abstract statements about the frequency of wrong
and right statistical statements (instead of observations). This is exactly what Bayesian inference
forces us to do from the start.
Typical Bayesian questions
There is a new test for the Schnitzler syndrome with the
characteristics:
● 5% false positive
● 10% false negative
You order the test online, and get a positive result.

Should you be worried?
The Bayesian says: Bayes' theorem

“What are the priors?”
Prior
1:100000 persons have the disease

Global likelihood
This yields a very low posterior probability:

Pros and Cons of the two approaches
Frequentist Bayesian
Pro: Pro:
● No prior dependence ● Prior dependence is formalized
(what is the prior for a flat Universe?) ● Reasoning about causes, not
● Objective procedure observations

● Clear interpretation of results ● Hard to use completely wrong
(this is a conjecture to be tested in

this course)
Con: Con:
● Be aware of hidden trials ● Results are difficult to show in a
● Publication bias, many researches prior-independent way

● Many ways of calculating p-values ● Some people think it is “esoteric”
● “Frequencies” refer to repeated ● Difficult for non-parametric studies
experiments with exactly the same

conditions
Basic definitions I
● A characteristic of a system is said to be random when it is not known or cannot
be predicted with complete certainty.
● The degree of randomness can be quantified with the concept of probability (or
frequencies; in the Frequentist sense).
● The sample space consists of a certain set of elements that are the values or
properties that a random variable can acquire.
● The probability distribution function describes the probability (either as frequency or

subjective probability) that a certain value is realized.
Probability mass function (PMF)
Probability density function (PDF)
● In general, it depends on prior assumptions and hypothesis, here summarized as H.

Basic definitions II
● Mean value for discrete or continuous distributions
● Variance and standard deviation

and
● Covariance
● Median xm
● Mode
● Skewness
● n-th central moment:

Important discrete distributions
Bernoulli distribution
A single yes/no question, answered yes (1) with probability p
● Example: Throw a biased coin
Binomial distribution
Number of successes in draw of n elements with individual success probability p.
● Example: Draw of colored beans from a large bin

The Poisson Distribution
Poisson distribution
● Example: Number of detected photons from an

radioactive source
● Note: The sum of N Poisson-distributed random
variables is Poisson distributed, with mean
● Follows from Binomial distribution in the limit

Normal and chi-squared distribution
Gaussian / Bell curve / Laplacean / Normal distribution
Notes:
● Its central importance comes from the central limit theorem
● Many random variables are normal distributed in practice, but the reasons for
that are often complex.

● “Everybody believes in the law of errors, the experiments because they think
it is a mathematical theorem, the mathematicians because they think it is a

experimental fact.” (Lippmann; Barlow p36)
Chi-squared distribution
Describes statistical distribution of outliers
● Is defined as the sum of squares of normal distributed

variables
● Describes the statistical distribution of outliers

● Important because of Wilks' theorem
Multivariate normal distribution
“Rotated” normal distributions
● The PDF is similar to the 1-dim case
with mean and variance
● The two-dimensional case with variables x, y
where ρ is the correlation between x and y.
● Note that each xi individually is normal

distributed with variance σi .
Other useful distributions
Exponential distribution Uniform distribution
Log-normal distribution
● ln(x) is normal distributed
Cauchy distribution / Breit-Wigner distribution

● The proper Cauchy distribution is obtained for
● Though this distribution is omnipresent in particle physics, its convergence behavior is

extremely bad
Student's t-distribution
● Generalization of unit normal distribution when variance is estimated
from data
Many interrelations
Sum rules Dashed: approximation
Solid: relations
● Poisson + Poisson = Poisson

● Normal + Normal = Normal
● Chi-squared + chi-squared = chi-squared
● Cauchy + Cauchy = Cauchy
Important approximations
● Binomial to Poisson
● Poisson to Normal
● Chi-squared to Normal
● Student t to Standard Normal
From http://www.johndcook.com/blog/distribution_chart/
The central role of the normal distribution
The Central Limit Theorem (CLT) Example: chi-squared distribution
The sum of n independent continuous random variables,
with means and variances
becomes a Gaussian random variable with mean and

variance
Tails are better visible in log-scale
Notes:
● The CTL holds for a very large number of underlying
distributions, but for some it completely fails.

● In general, the CTL works better at the center of the
distribution than far away from the center.

Orders of
Warning: magnitude!
● Even if the center of the summed distribution is
indistinguishable from a Gaussian, there might be very

large deviations in the “wings” or “tails”!
Proof of the Central Limit Theorem
We are interested in the following PDF (the sum of variables with distributions f, g, h):
A few useful definitions:
A) The Characteristic Function.
Note: Convolutions simplify to multiplications
B) Cumulants. Define by the Taylor expansion of the log of the characteristic function
Proof of the Central Limit Theorem
The first three cumulants are functions of the mean, the variance and the skew
mean
For normal
variance distribution:
Adding N distributions with cumulant implies

(notational simplifications)
To see what this means, we rescale x such that the variance equals one
This implies
and in the
large N limit
Hence, for large enough values of N, only the first two cumulants are important.

Advance Statistics

Uploaded by

Copyright:

Available Formats

Advance Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advance Statistics

Uploaded by

Copyright:

Available Formats

Advanced Statistical Methods

● Homework is handed out at beginning of TA session and should be handed in

one week later at end of TA session

● Exercises require analytic work as well as numerical work on the computer

● Homework can be hand in hand-written, or send via Email (PDF)

● For numerical work, programs should be written as Ipython Notebooks and

● The total grade depends on both homework assignments (60%) as well as

the exam (40%)

● Michael Feyereisen (m.r.feyereisen@uva.nl)

● Richard Bartels (richard.t.bartels@gmail.com)

Slides & homework: https://staff.fnwi.uva.nl/c.weniger/

● Bonus: Monte Carlo methods and Unfolding

R. J. Barlow, Statistics, A guide to the Use of

Frequentist data analysis

P. Gregory, Bayesian Logical Data Analysis for

J.V. Wall and C.R. Jenkins, Practical Statistics

(both Frequentist and Bayesian)

● Since most of you attended the course by Phil Uttley on Statistical

● SM was based on Simon Vaughan's book Scientific Inference,

● When describing weak signals, close

● Introduction: Bayesian and Frequentist statistics

● Confidence intervals & Wilk's theorem

● Priors: Flat prior, Jeffery's prior, Non-informative priors

● Sampling techniques: Markov Chain Monte Carlo, Multinest

● Applications & Advanced material

● Angular power spectrum

● Bootstrapping and Jackknife

Deductive logic Inductive logic

● Central quantity: “p-value” ● “Posterior distribution”

● Plausibility or believe in a proposition (the believe in “The Higgs boson exists.”)

The most relevant rules

(I indicates background information)

● An event/proposition is either true or false (inference in binary logic)

● Elementary events/propositions follow the rules of set theory

(for a full discussion and derivation from

a certain hypothesis H to be true in light of data D. This always depends on

● 10% false negative

You order the test online, and get a positive result.

Caveat: There are hidden trials

Still, one of them will get a positive result on average.

● 10% false negative

You order the test online, and get a positive result.

The Bayesian says: Bayes' theorem

1:100000 persons have the disease

This yields a very low posterior probability:

● Objective procedure observations

(this is a conjecture to be tested in

● Publication bias, many researches prior-independent way

● “Frequencies” refer to repeated ● Difficult for non-parametric studies

experiments with exactly the same

● The probability distribution function describes the probability (either as frequency or

Probability mass function (PMF)

Probability density function (PDF)

● In general, it depends on prior assumptions and hypothesis, here summarized as H.

● Variance and standard deviation

● n-th central moment:

● Example: Throw a biased coin

● Example: Draw of colored beans from a large bin

● Example: Number of detected photons from an