Advance Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Advanced Statistical Methods

Lecture 1
Homework and Exam

Homework assignments
● 2 x 2 hours TA sessions per week (Tuesday & Thursday 11-13h, same room)

● Homework is handed out at beginning of TA session and should be handed in

one week later at end of TA session


● Help on the homework is provided during TA sessions

● Exercises require analytic work as well as numerical work on the computer

● Homework can be hand in hand-written, or send via Email (PDF)

● For numerical work, programs should be written as Ipython Notebooks and

send via Email. They should “run out of the box” to give full points.

Exam
● There will be a written exam in the last session, on Thursday 29 th January

● The total grade depends on both homework assignments (60%) as well as

the exam (40%)

Contact
● Christoph Weniger (c.weniger@uva.nl)

● Michael Feyereisen (m.r.feyereisen@uva.nl)

● Richard Bartels (richard.t.bartels@gmail.com)

Slides & homework: https://staff.fnwi.uva.nl/c.weniger/


Later: Blackboard
Recommended Literature
Glen Cowan, Statistical Data Analysis,
Oxford Science Publications, 1998
● Frequentist analysis, well known in Particle Physics

● Bonus: Monte Carlo methods and Unfolding

R. J. Barlow, Statistics, A guide to the Use of


Statistical Methods in the Physical Sciences,
The Manchester Physics Series, 1988
● Traditional and very good book on

Frequentist data analysis

P. Gregory, Bayesian Logical Data Analysis for


the Physical Sciences, Cambridge University
Press, 2005
● Bayesian “Bible”, Conceptual introduction

● Many examples

J.V. Wall and C.R. Jenkins, Practical Statistics


for Astronomers, Cambridge Observing
Handbooks for Research Astronomers, 2003
● Practical book for data analysis in Astronomy

(both Frequentist and Bayesian)


● Many examples
Connection to Phil's Course

● Since most of you attended the course by Phil Uttley on Statistical


Methods in Astrophysics and Astronomy (SM), I will assume that
you know many of the basics, and continue from there.

● SM was based on Simon Vaughan's book Scientific Inference,


Learning from Data.

● In the first week of the present course, we will briefly repeat some
of the most relevant material from SM as a reminder and to
provide context for the rest of the course.
Understanding statistical tools matters
This course is about why statistical methods work.

● When describing weak signals, close


to the experimental threshold, the
details of the statistical method are
crucial
● In cases where the experimental result is ● Assumptions underlying the standard
clear, the details of the statistical method recipes might be violated
often do not matter. ● It is important to understand not
● In many cases, it is enough to apply standard only who, but why statistical
statistical recipes (normal distribution, error inference works. This is what the
propagation), to get reasonable results. present course aims to do.
Overview

● Introduction: Bayesian and Frequentist statistics


● Probability distribution functions & Central limit theorem
● Frequentist analyses
● Hypothesis testing

● Estimators

● Confidence intervals & Wilk's theorem


First
● Profile likelihood technique & pitfalls
four
● Trial factors & Coverage
lectures
● Numerical minimizers

● Bayesian analyses
● Basics: Evidence, Model selection, Credible intervals

● Priors: Flat prior, Jeffery's prior, Non-informative priors

● Sampling techniques: Markov Chain Monte Carlo, Multinest

● Applications & Advanced material


● Principal component analysis

● Angular power spectrum

● Bootstrapping and Jackknife

● ...
The two grand schools of statistical analysis

Frequentist Bayesian

Fisher Bayes
Possible Phenomena
Cause Phenomena causes

Deductive logic Inductive logic


● Based on “frequencies” of phenomena ● Probabilistic extension of logic

● Central quantity: “p-value” ● “Posterior distribution”

“Given a cause, what is the frequency (in “How does an observed phenomenon change
repeated experiments) of a certain my believe in different possible causes?”
phenomenon to occur?”
Probabilities in a nutshell
“Probabilities” mean here
● Frequencies of events (in 1/6 of the cases the dice shows a six)

● Plausibility or believe in a proposition (the believe in “The Higgs boson exists.”)

The most relevant rules


● Degrees of plausibility are represented by real numbers between 0 (not realized) and 1 (realized)

● Probabilities for mutually exclusive and exhaustive elementary events/propositions sum to one:

(I indicates background information)

● An event/proposition is either true or false (inference in binary logic)

● Structural consistency (the result does not depend on the way of reasoning) is guaranteed by the
rule for conditional probabilities

● Elementary events/propositions follow the rules of set theory

(for a full discussion and derivation from


fundamental requirements for consistent
reasoning see Chapter 2.5 in Gregory)
Bayes' Theorem
Likelihood function Prior
Posterior

Model evidence or
global likelihood
It is a direct consequence of the rule for conditional probabilities:

Notes:
● Bayes' theorem provides a rule for how to update the probability or plausibility of

a certain hypothesis H to be true in light of data D. This always depends on


additional background information I, which is often not made explicit.
● Frequentists are interested in likelihood functions only

It is in general not equal to the posterior, which is most obvious looking at the
normalization of the functions (with x and θ being data and model parameters,
respectively).
Typical Frequentist questions
There is a new test for the Schnitzler syndrome (c) with the
characteristics:

● 5% false positive

● 10% false negative

You order the test online, and get a positive result.


Should you be worried?
Fisher
The Frequentist says:
“I can exclude the null-hypothesis of not having the Schnitzler syndrome at 95% CL.
End of story.”

Caveat: There are hidden trials


● There might be 20 other people having done the test, none of them having the disease.

Still, one of them will get a positive result on average.


● Maybe you also did tests for other diseases.

● Maybe nobody had the Schnitzler syndrom in the last 100 years

→ Chances for you to having the disease could be still very low.
One could account for hidden trials by making abstract statements about the frequency of wrong
and right statistical statements (instead of observations). This is exactly what Bayesian inference
forces us to do from the start.
Typical Bayesian questions
There is a new test for the Schnitzler syndrome with the
characteristics:

● 5% false positive

● 10% false negative

You order the test online, and get a positive result.


Should you be worried?

The Bayesian says: Bayes' theorem


“What are the priors?”
Prior

1:100000 persons have the disease


Global likelihood

This yields a very low posterior probability:


Pros and Cons of the two approaches

Frequentist Bayesian

Pro: Pro:
● No prior dependence ● Prior dependence is formalized

(what is the prior for a flat Universe?) ● Reasoning about causes, not

● Objective procedure observations


● Clear interpretation of results ● Hard to use completely wrong

(this is a conjecture to be tested in


this course)

Con: Con:
● Be aware of hidden trials ● Results are difficult to show in a

● Publication bias, many researches prior-independent way


● Many ways of calculating p-values ● Some people think it is “esoteric”

● “Frequencies” refer to repeated ● Difficult for non-parametric studies

experiments with exactly the same


conditions
Basic definitions I
● A characteristic of a system is said to be random when it is not known or cannot
be predicted with complete certainty.
● The degree of randomness can be quantified with the concept of probability (or
frequencies; in the Frequentist sense).
● The sample space consists of a certain set of elements that are the values or
properties that a random variable can acquire.

● The probability distribution function describes the probability (either as frequency or


subjective probability) that a certain value is realized.

Probability mass function (PMF)

Probability density function (PDF)

● In general, it depends on prior assumptions and hypothesis, here summarized as H.


Basic definitions II
● Mean value for discrete or continuous distributions

● Variance and standard deviation


and
● Covariance

● Median xm

● Mode

● Skewness

● n-th central moment:


Important discrete distributions
Bernoulli distribution
A single yes/no question, answered yes (1) with probability p

● Example: Throw a biased coin

Binomial distribution
Number of successes in draw of n elements with individual success probability p.

● Example: Draw of colored beans from a large bin


The Poisson Distribution

Poisson distribution

● Example: Number of detected photons from an


radioactive source
● Note: The sum of N Poisson-distributed random
variables is Poisson distributed, with mean

● Follows from Binomial distribution in the limit


Normal and chi-squared distribution
Gaussian / Bell curve / Laplacean / Normal distribution

Notes:
● Its central importance comes from the central limit theorem

● Many random variables are normal distributed in practice, but the reasons for

that are often complex.


● “Everybody believes in the law of errors, the experiments because they think

it is a mathematical theorem, the mathematicians because they think it is a


experimental fact.” (Lippmann; Barlow p36)

Chi-squared distribution
Describes statistical distribution of outliers

● Is defined as the sum of squares of normal distributed


variables

● Describes the statistical distribution of outliers


● Important because of Wilks' theorem
Multivariate normal distribution
“Rotated” normal distributions
● The PDF is similar to the 1-dim case

with mean and variance

● The two-dimensional case with variables x, y

where ρ is the correlation between x and y.

● Note that each xi individually is normal


distributed with variance σi .
Other useful distributions
Exponential distribution Uniform distribution

Log-normal distribution
● ln(x) is normal distributed

Cauchy distribution / Breit-Wigner distribution


● The proper Cauchy distribution is obtained for

● Though this distribution is omnipresent in particle physics, its convergence behavior is


extremely bad
Student's t-distribution
● Generalization of unit normal distribution when variance is estimated

from data
Many interrelations
Sum rules Dashed: approximation
Solid: relations

● Poisson + Poisson = Poisson


● Normal + Normal = Normal
● Chi-squared + chi-squared = chi-squared
● Cauchy + Cauchy = Cauchy

Important approximations
● Binomial to Poisson

● Poisson to Normal

● Chi-squared to Normal

● Student t to Standard Normal

From http://www.johndcook.com/blog/distribution_chart/
The central role of the normal distribution
The Central Limit Theorem (CLT) Example: chi-squared distribution

The sum of n independent continuous random variables,

with means and variances

becomes a Gaussian random variable with mean and


variance

Tails are better visible in log-scale

Notes:
● The CTL holds for a very large number of underlying

distributions, but for some it completely fails.


● In general, the CTL works better at the center of the

distribution than far away from the center.


Orders of
Warning: magnitude!
● Even if the center of the summed distribution is

indistinguishable from a Gaussian, there might be very


large deviations in the “wings” or “tails”!
Proof of the Central Limit Theorem
We are interested in the following PDF (the sum of variables with distributions f, g, h):

A few useful definitions:

A) The Characteristic Function.

Note: Convolutions simplify to multiplications

B) Cumulants. Define by the Taylor expansion of the log of the characteristic function
Proof of the Central Limit Theorem
The first three cumulants are functions of the mean, the variance and the skew
mean
For normal
variance distribution:

Adding N distributions with cumulant implies


(notational simplifications)

To see what this means, we rescale x such that the variance equals one

This implies

and in the
large N limit

Hence, for large enough values of N, only the first two cumulants are important.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy